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About This Book 


The primary objective of this manual is to help programmers provide software that is 
compatible across the family of PowerPC™ processors. Because the PowerPC architecture 
is designed to be flexible to support a broad range of processors, this book provides a 
general description of features that are common to PowerPC processors and indicates those 
features that are optional or that may be implemented differently in the design of each 
processor. 





This revision of this book describes only the 32-bit portion of the PowerPC architecture in 
detail. This book provides a subset of the information provided in PowerPC 
Microprocessor Family: The Programming Environments, which describes both the 64- 
and 32-bit portions of the architecture. Both books reflect changes to the PowerPC 
architecture made subsequent to the publication of PowerPC Microprocessor Family: The 
Programming Environments, Rev. 0 and Rev. 0.1. 


To locate any published errata or updates for this document, refer to the world-wide web at 
http://www.mot.com/powerpc/ or at http://www.chips.ibm.com/products/ppc. 


For designers working with a specific processor, this book should be used in conjunction 
with the user’s manual for that processor. For information regarding variances between a 
processor implementation and the version of the PowerPC architecture reflected in this 
document, see the reference to Implementation Variances Relative to Rev. 1 of The 
Programming Environments Manual described in “PowerPC Documentation,” on Page 
XXiX. 


This document distinguishes between the three levels, or programming environments, of 
the PowerPC architecture, which are as follows: 


¢ PowerPC user instruction set architecture (UISA)—The UISA defines the level of 
the architecture to which user-level software should conform. The UISA defines the 
base user-level instruction set, user-level registers, data types, memory conventions, 
and the memory and programming models seen by application programmers. 


¢ PowerPC virtual environment architecture (WEA)—The VEA, which is the smallest 
component of the PowerPC architecture, defines additional user-level functionality 
that falls outside typical user-level software requirements. The VEA describes the 
memory model for an environment in which multiple processors or other devices can 
access external memory, and defines aspects of the cache model and cache control 
instructions from a user-level perspective. The resources defined by the VEA are 
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particularly useful for optimizing memory accesses and for managing resources in 
an environment in which other processors and other devices can access external 
memory. 


Implementations that conform to the PowerPC VEA also adhere to the UISA, but 
may not necessarily adhere to the OEA. 


* PowerPC operating environment architecture (OEA)—The OEA defines supervisor- 
level resources typically required by an operating system. The OEA defines the 
PowerPC memory management model, supervisor-level registers, and the exception 
model. 


Implementations that conform to the PowerPC OEA also conform to the PowerPC 
UISA and VEA. 


TEMPORARY 64-BIT BRIDGE 


The OEA also defines optional features to simplify the migration of 32-bit 
operating systems to 64-bit implementations. This information is not discussed in 
detail in this book, but is discussed as part of the 64-bit architecture in The 
PowerPC Microprocessor Family: The Programming Environments. 


It is important to note that some resources are defined more generally at one level in the 
architecture and more specifically at another. For example, conditions that can cause a 
floating-point exception are defined by the UISA, while the exception mechanism itself is 
defined by the OEA. 


Because it is important to distinguish between the levels of the architecture in order to 
ensure compatibility across multiple platforms, those distinctions are shown clearly 
throughout this book. The level of the architecture to which text refers is indicated in the 
outer margin, using the conventions shown in “Conventions,” on Page xxxi. 


This book does not attempt to replace the PowerPC architecture specification, which 
defines the architecture from the perspective of the three programming environments and 
which remains the defining document for the PowerPC architecture. This book reflects 
changes made to the architecture before August 6, 1996. These changes are described in 
Section 1.3, “Changes in This Revision of The Programming Environments Manual.” For 
information about the architecture specification, see “General Information,” on Page xxviii. 


For ease in reference, this book and the processor user’s manuals have arranged the 
architecture information into topics that build upon one another, beginning with a 
description and complete summary of registers and instructions (for all three environments) 
and progressing to more specialized topics such as the cache, exception, and memory 
management models. As such, chapters may include information from multiple levels of the 
architecture; for example, the discussion of the cache model uses information from both the 
VEA and the OFA. 
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It is beyond the scope of this manual to describe individual PowerPC processors. It must be 
kept in mind that each PowerPC processor is unique in its implementation of the PowerPC 
architecture. 


The information in this book is subject to change without notice, as described in the 
disclaimers on the title page of this book. As with any technical documentation, it is the 
readers’ responsibility to be sure they are using the most recent version of the 
documentation. For more information, contact your sales representative. 


Audience 


This manual is intended for system software and hardware developers and application 
programmers who want to develop products for the PowerPC processors in general. It is 
assumed that the reader understands operating systems, microprocessor system design, and 
the basic principles of RISC processing. 


This revision of this book describes only the 32-bit portion of the PowerPC architecture in 
detail. Readers who need to know more about the architecture specifications for 64-bit 
PowerPC processors should refer to PowerPC Microprocessor Family: The Programming 
Environments, which contains both the information presented in both the 32- and 64-bit 
portions of the architecture. 


Organization 
Following is a summary and a brief description of the major sections of this manual: 


¢ Chapter 1, “Overview,” is useful for those who want a general understanding of the 
features and functions of the PowerPC architecture. This chapter describes the 
flexible nature of the PowerPC architecture definition and provides an overview of 
how the PowerPC architecture defines the register set, operand conventions, 
addressing modes, instruction set, cache model, exception model, and memory 
management model. 


¢ Chapter 2, “PowerPC Register Set,” is useful for software engineers who need to 
understand the PowerPC programming model for the three programming 
environments and the functionality of the PowerPC registers. 


¢ Chapter 3, “Operand Conventions,” describes PowerPC conventions for storing data 
in memory, including information regarding alignment, single- and double- 
precision floating-point conventions, and big- and little-endian byte ordering. 


¢ Chapter 4, “Addressing Modes and Instruction Set Summary,” provides an overview 
of the PowerPC addressing modes and a description of the PowerPC instructions. 
Instructions are organized by function. 


¢ Chapter 5, “Cache Model and Memory Coherency,” provides a discussion of the 
cache and memory model defined by the VEA and aspects of the cache model that 
are defined by the OEA. 
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Chapter 6, “Exceptions,” describes the exception model defined in the OEA. 


Chapter 7, “Memory Management,” provides descriptions of the PowerPC address 
translation and memory protection mechanism as defined by the OEA. 


Chapter 8, “Instruction Set,’ functions as a handbook for the PowerPC instruction 
set. Instructions are sorted by mnemonic. Each instruction description includes the 
instruction formats and an individualized legend that provides such information as 
the level(s) of the PowerPC architecture in which the instruction may be found and 
the privilege level of the instruction. 


Appendix A, “PowerPC Instruction Set Listings,” lists all the PowerPC instructions. 
Instructions are grouped according to mnemonic, opcode, function, and form. 


Appendix B, “POWER Architecture Cross Reference,” identifies the differences that 
must be managed in migration from the POWER architecture to the PowerPC 
architecture. 


Appendix C, “Multiple-Precision Shifts,” describes how multiple-precision shift 
operations can be programmed as defined by the UISA. 


Appendix D, “Floating-Point Models,” gives examples of how the floating-point 
conversion instructions can be used to perform various conversions as described in 
the UISA. 


Appendix E, “Synchronization Programming Examples,” gives examples showing 
how synchronization instructions can be used to emulate various synchronization 
primitives and how to provide more complex forms of synchronization. 


Appendix F, “Simplified Mnemonics,” provides a set of simplified mnemonic 
examples and symbols. 


This manual also includes a glossary and an index. 


Suggested Reading 


This section lists additional reading that provides background for the information in this 
manual as well as general information about the PowerPC architecture. 


General Information 


The following documentation provides useful information about the PowerPC architecture 
and computer architecture in general: 
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The following books are available from the Morgan-Kaufmann Publishers, 340 Pine 
Street, Sixth Floor, San Francisco, CA 94104; Tel. (800) 745-7323 (U.S.A.), (415) 
392-2665 (International); internet address: mkp@mkp.com. 


— The PowerPC Architecture: A Specification for a New Family of RISC 
Processors, Second Edition, by International Business Machines, Inc. 


Updates to the architecture specification are accessible via the world-wide web 
at http://www.austin.ibm.com/tech/ppc-chg.html. 


PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


— PowerPC Microprocessor Common Hardware Reference Platform: A System 
Architecture, by Apple Computer, Inc., International Business Machines, Inc., 
and Motorola, Inc. 


— Macintosh Technology in the Common Hardware Reference Platform, by Apple 
Computer, Inc. 


— Computer Architecture: A Quantitative Approach, Second Edition, by 
John L. Hennessy and David A. Patterson, 


¢ Inside Macintosh: PowerPC System Software, Addison-Wesley Publishing 
Company, One Jacob Way, Reading, MA, 01867; Tel. (800) 282-2732 (U.S.A.), 
(800) 637-0029 (Canada), (716) 871-6555 (International). 


* PowerPC Programming for Intel Programmers, by Kip McClanahan; IDG Books 
Worldwide, Inc., 919 East Hillsdale Boulevard, Suite 400, Foster City, CA, 94404; 
Tel. (800) 434-3422 (U.S.A.), (415) 655-3022 (International). 


PowerPC Documentation 
The PowerPC documentation is organized in the following types of documents: 


¢ User’s manuals—These books provide details about individual PowerPC 
implementations and are intended to be used in conjunction with The Programming 
Environments Manual. These include the following: 


— PowerPC 601™ RISC Microprocessor User’s Manual: 
MPC601UM/AD (Motorola order #) 


— PowerPC 602™ RISC Microprocessor User’s Manual: 
MPC602UM/AD (Motorola order #) 


— PowerPC 603e™ RISC Microprocessor User’s Manual with Supplement for 
PowerPC 603 Microprocessor: 
MPC603EUM/AD (Motorola order #) 


— PowerPC 604™ RISC Microprocessor User’s Manual: 
MPC604UM/AD (Motorola order #) 


¢ PowerPC Microprocessor Family: The Programming Environments, Rev. 1 
provides information about resources defined by the PowerPC architecture that are 
common to PowerPC processors. This document describes both the 64- and 32-bit 
portions of the architecture. 

MPCFPE/AD (Motorola order #) 


¢ Implementation Variances Relative to Rev. 1 of The Programming Environments 
Manual is available via the world-wide web at http://www.mot.com/powerpc/. 
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Addenda/errata to user’s manuals—Because some processors have follow-on parts 
an addendum is provided that describes the additional features and changes to 
functionality of the follow-on part. These addenda are intended for use with the 
corresponding user’s manuals. These include the following: 


— Addendum to PowerPC 603e RISC Microprocessor User’s Manual: PowerPC 
603e Microprocessor Supplement and User’s Manual Errata: 
MPC603EUMAD/AD (Motorola order #) 


— Addendum to PowerPC 604 RISC Microprocessor User’s Manual: PowerPC 
604e™ Microprocessor Supplement and User’s Manual Errata: 
MPC604UMAD/AD (Motorola order #) 


Hardware specifications—Hardware specifications provide specific data regarding 
bus timing, signal behavior, and AC, DC, and thermal characteristics, as well as 
other design considerations for each PowerPC implementation. These include the 
following: 


— PowerPC 601 RISC Microprocessor Hardware Specifications: 
MPC601EC/D (Motorola order #) 


— PowerPC 602 RISC Microprocessor Hardware Specifications: 
MPC602EC/D (Motorola order #) 


— PowerPC 603 RISC Microprocessor Hardware Specifications: 
MPC603EC/D (Motorola order #) 

— PowerPC 603e RISC Microprocessor Family: PID6-603e Hardware 
Specifications: 
MPC603EEC/D (Motorola order #) 

— PowerPC 603e RISC Microprocessor Family: PID7V-603e Hardware 
Specifications: 
MPC603E7VEC/D (Motorola order #) 


— PowerPC 604 RISC Microprocessor Hardware Specifications: 
MPC604EC/D (Motorola order #) 


— PowerPC 604e RISC Microprocessor Family: PID9V-604e Hardware 
Specifications: 
MPC604E9VEC/D (Motorola order #) 
Technical Summaries—Each PowerPC implementation has a technical summary 
that provides an overview of its features. This document is roughly the equivalent to 
the overview (Chapter 1) of an implementation’s user’s manual. Technical 
summaries are available for the 601, 602, 603, 603e, 604, and 604e as well as the 
following: 


— PowerPC 620™ RISC Microprocessor Technical Summary: MPC620/D 
(Motorola order #) 


PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


¢ PowerPC Microprocessor Family: The Bus Interface for 32-Bit Microprocessors: 
MPCBUSIF/AD (Motorola order #) provides a detailed functional description of the 
60x bus interface, as implemented on the 601, 603, and 604 family of PowerPC 
microprocessors. This document is intended to help system and chipset developers 
by providing a centralized reference source to identify the bus interface presented by 
the 60x family of PowerPC microprocessors. 


¢ PowerPC Microprocessor Family: The Programmer's Reference Guide: 
MPCPRG/D (Motorola order #) is a concise reference that includes the register 
summary, memory control model, exception vectors, and the PowerPC instruction 
set. 


¢ PowerPC Microprocessor Family: The Programmer's Pocket Reference Guide: 
MPCPRGREF/D (Motorola order #): This foldout card provides an overview of the 
PowerPC registers, instructions, and exceptions for 32-bit implementations. 


¢ Application notes—These short documents contain useful information about 
specific design issues useful to programmers and engineers working with PowerPC 
processors. 

¢ Documentation for support chips—These include the following: 


— MPC105 PCI Bridge/Memory Controller User’s Manual: 
MPC105UM/AD (Motorola order #) 


— MPC106 PCI Bridge/Memory Controller User’s Manual: 
MPC106UM/AD (Motorola order #) 


Additional literature on PowerPC implementations is being released as new processors 
become available. For a current list of PowerPC documentation, refer to the world-wide 
web at http://www.mot.com/powerpc/. 


Conventions 

This document uses the following notational conventions: 

mnemonics Instruction mnemonics are shown in lowercase bold. 

italics Italics indicate variable command parameters, for example, bectrx. 
Book titles in text are set in italics. 

0x0 Prefix to denote hexadecimal number 

Ob0 Prefix to denote binary number 

rA, rB Instruction syntax used to identify a source GPR 

rD Instruction syntax used to identify a destination GPR 

frA, frB, frC Instruction syntax used to identify a source FPR 

frD Instruction syntax used to identify a destination FPR 

REG[FIELD] Abbreviations or acronyms for registers are shown in uppercase text. 


Specific bits, fields, or ranges appear in brackets. For example, 
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0000 


MSR[LE] refers to the little-endian mode enable bit in the machine 
state register. 


In certain contexts, such as a signal encoding, this indicates a don’t 
care. 


Used to express an undefined numerical value 
NOT logical operator 

AND logical operator 

OR logical operator 


This symbol identifies text that is relevant with respect to the 
PowerPC user instruction set architecture (UISA). This symbol is 
used both for information that can be found in the UISA specification 
as well as for explanatory information related to that programming 
environment. 


This symbol identifies text that is relevant with respect to the 
PowerPC virtual environment architecture (VEA). This symbol is 
used both for information that can be found in the VEA specification 
as well as for explanatory information related to that programming 
environment. 


This symbol identifies text that is relevant with respect to the 
PowerPC operating environment architecture (OEA). This symbol is 
used both for information that can be found in the OFA specification 
as well as for explanatory information related to that programming 
environment. 


Indicates reserved bits or bit fields in a register. Although these bits 
may be written to as either ones or zeros, they are always read as 
ZeTOS. 


TEMPORARY 64-BIT BRIDGE 


Text that pertains to the optional 64-bit bridge defined by the OEA 
is presented with a grayed background, as shown here. This 
information is not discussed in detail in this book, but is discussed 
as part of the 64-bit architecture in The PowerPC Microprocessor 
Family: The Programming Environments. 


Additional conventions used with instruction encodings are described in Table 8-2 on page 
8-2. Conventions used for pseudocode examples are described in Table 8-3 on page 8-4. 
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Acronyms and Abbreviations 


Table i contains acronyms and abbreviations that are used in this document. Note that the 
meanings for some acronyms (such as SDR1 and XER) are historical, and the words for 
which an acronym stands may not be intuitively obvious. 


Table i. Acronyms and Abbreviated Terms 


r 
z 
z 
z 
E 


ie) 


a 


Cc 
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Table i. Acronyms and Abbreviated Terms (Continued) 
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Table i. Acronyms and Abbreviated Terms (Continued) 


ce 
UISA User instruction set architecture 
Virtual address 


Terminology Conventions 


Table ii lists certain terms used in this manual that differ from the architecture terminology 
conventions. 





Table ii. Terminology Conventions 


The Architecture Specification 
Data storage interrupt (DSI) 
Instruction storage interrupt (ISI) 


Privileged mode (or privileged state) 
Problem mode (or problem state) 
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Table iii describes instruction field notation conventions used in this manual. 


Table iii. Instruction Field Conventions 


The Architecture Specification Equivalent to: 
BA, BB, BT crbA, crbB, crbD (respectively) 
BF, BFA crfD, crfS (respectively) 


FRA, FRB, FRC, FRT, FRS frA, frB, frC, frD, frS (respectively) 
RA, RB, RT, RS rA, rB, rD, rS (respectively) 
/, HM, Ml 0...0 (shaded) 
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Chapter 1 
Overview 


The PowerPC™ architecture provides a software model that ensures software compatibility 
among implementations of the PowerPC family of microprocessors. In this document, and 
in other PowerPC documentation as well, the term ‘implementation’ refers to a hardware 
device (typically a microprocessor) that complies with the specifications defined by the 
architecture. 


The PowerPC architecture is a 64-bit architecture with a 32-bit subset. This manual 
describes the architecture from a 32-bit perspective. Although some 64-bit resources are 
discussed, this manual does not completely describe details of the 64-bit—only features of 
the architecture, in particular with respect to the memory management model, registers, and 
instruction set. For more information about the 64-bit aspects of the PowerPC architecture, 
refer to PowerPC Microprocessor Family: The Programming Environments, which 
contains the information in this book as well. 


In general, the architecture defines the following: 


¢ Instruction set—The instruction set specifies the families of instructions (such as 
load/store, integer arithmetic, and floating-point arithmetic instructions), the specific 
instructions, and the forms used for encoding the instructions. The instruction set 
definition also specifies the addressing modes used for accessing memory. 


¢ Programming model—The programming model defines the register set and the 
memory conventions, including details regarding the bit and byte ordering, and the 
conventions for how data (such as integer and floating-point values) are stored. 


¢ Memory model—The memory model defines the size of the address space and of the 
subdivisions (pages and blocks) of that address space. It also defines the ability to 
configure pages and blocks of memory with respect to caching, byte ordering (big- 
or little-endian), coherency, and various types of memory protection. 


¢ Exception model—The exception model defines the common set of exceptions and 
the conditions that can generate those exceptions. The exception model specifies 
characteristics of the exceptions, such as whether they are precise or imprecise, 
synchronous or asynchronous, and maskable or nonmaskable. The exception model 
defines the exception vectors and a set of registers used when exceptions are taken. 
The exception model also provides memory space for implementation-specific 
exceptions. (Note that exceptions are referred to as interrupts in the architecture 
specification.) 
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* Memory management model—The memory management model defines how 
memory is partitioned, configured, and protected. The memory management model 
also specifies how memory translation is performed, the real, virtual, and physical 
address spaces, special memory control instructions, and other characteristics. 
(Physical address is referred to as real address in the architecture specification.) 


¢ Time-keeping model—The time-keeping model defines facilities that permit the 
time of day to be determined and the resources and mechanisms required for 
supporting time-related exceptions. 


These aspects of the PowerPC architecture are defined at different levels of the architecture, 
and this chapter provides an overview of those levels—the user instruction set architecture 
(UISA), the virtual environment architecture (VEA), and the operating environment 
architecture (OEA). 


To locate any published errata or updates for this document, refer to the website at 
http://www.mot.com/powerpc/ or at http://www.chips.ibm.com/products/ppc. 


1.1 PowerPC Architecture Overview 


The PowerPC architecture, developed jointly by Motorola, IBM, and Apple Computer, is 
based on the POWER architecture implemented by RS/6000™ family of computers. The 
PowerPC architecture takes advantage of recent technological advances in such areas as 
process technology, compiler design, and reduced instruction set computing (RISC) 
microprocessor design to provide software compatibility across a diverse family of 
implementations, primarily single-chip microprocessors, intended for a wide range of 
systems, including battery-powered personal computers; embedded controllers; high-end 
scientific and graphics workstations; and multiprocessing, microprocessor-based 
mainframes. 


To provide a single architecture for such a broad assortment of processor environments, the 
PowerPC architecture is both flexible and scalable. 


The flexibility of the PowerPC architecture offers many price/performance options. 
Designers can choose whether to implement architecturally-defined features in hardware or 
in software. For example, a processor designed for a high-end workstation has greater need 
for the performance gained from implementing floating-point normalization and 
denormalization in hardware than a battery-powered, general-purpose computer might. 


The PowerPC architecture is scalable to take advantage of continuing technological 
advances—for example, the continued miniaturization of transistors makes it more feasible 
to implement more execution units and a richer set of optimizing features without being 
constrained by the architecture. 
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The PowerPC architecture defines the following features: 


Separate 32-entry register files for integer and floating-point instructions. The 
general-purpose registers (GPRs) hold source data for integer arithmetic 
instructions, and the floating-point registers (FPRs) hold source and target data for 
floating-point arithmetic instructions. 


Instructions for loading and storing data between the memory system and either the 
FPRs or GPRs 


Uniform-length instructions to allow simplified instruction pipelining and parallel 
processing instruction dispatch mechanisms 


Nondestructive use of registers for arithmetic instructions in which the second, third, 
and sometimes the fourth operand, typically specify source registers for calculations 
whose results are typically stored in the target register specified by the first operand. 


A precise exception model (with the option of treating floating-point exceptions 
imprecisely) 


Floating-point support that includes IEEE-754 floating-point operations 


A flexible architecture definition that allows certain features to be performed in 
either hardware or with assistance from implementation-specific software 
depending on the needs of the processor design 


The ability to perform both single- and double-precision floating-point operations 


User-level instructions for explicitly storing, flushing, and invalidating data in the 
on-chip caches. The architecture also defines special instructions (cache block touch 
instructions) for speculatively loading data before it is needed, reducing the effect of 
memory latency. 


Definition of a memory model that allows weakly-ordered memory accesses. This 
allows bus operations to be reordered dynamically, which improves overall 
performance and in particular reduces the effect of memory latency on instruction 
throughput. 


Support for separate instruction and data caches (Harvard architecture) and for 
unified caches 


Support for both big- and little-endian addressing modes 


Support for 64-bit addressing. The architecture supports both 32-bit or 64-bit 
implementations.This document describes the 32-bit portion of the PowerPC 
architecture. For information about the 64-bit architecture, see PowerPC 
Microprocessor Family: The Programming Environments. 
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This chapter provides an overview of the major characteristics of the PowerPC architecture 
in the order in which they are addressed in this book: 


Register set and programming model 
Instruction set and addressing modes 
Cache implementations 

Exception model 

Memory management 


1.1.1 The 64-Bit PowerPC Architecture and the 32-Bit Subset 


The PowerPC architecture is a 64-bit architecture with a 32-bit subset. It is important to 
distinguish the following modes of operations: 


64-bit implementations/64-bit mode—The PowerPC architecture provides 64-bit 
addressing, 64-bit integer data types, and instructions that perform arithmetic 
operations on those data types, as well as other features to support the wider 
addressing range. For example, memory management differs somewhat between 32- 
and 64-bit processors. The processor is configured to operate in 64-bit mode by 
setting a bit in the machine state register (MSR). 


Processors that implement only the 32-bit portion of the PowerPC architecture 
provide 32-bit effective addresses, which is also the maximum size of integer data 
types. 

64-bit implementations/32-bit mode—For compatibility with 32-bit 
implementations, 64-bit implementations can be configured to operate in 32-bit 
mode by clearing the MSR[SF] bit. In 32-bit mode, the effective address is treated 
as a 32-bit address, condition bits, such as overflow and carry bits, are set based on 
32-bit arithmetic (for example, integer overflow occurs when the result exceeds 
32 bits), and the count register (CTR) is tested by branch conditional instructions 
following conventions for 32-bit implementations. All applications written for 32- 
bit implementations will run without modification on 64-bit processors running in 
32-bit mode. 
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1.1.2 The Levels of the PowerPC Architecture 


The PowerPC architecture is defined in three levels that correspond to three programming 
environments, roughly described from the most general, user-level instruction set 
environment, to the more specific, operating environment. 


This layering of the architecture provides flexibility, allowing degrees of software 
compatibility across a wide range of implementations. For example, an implementation 
such as an embedded controller may support the user instruction set, whereas it may be 
impractical for it to adhere to the memory management, exception, and cache models. 


The three levels of the PowerPC architecture are defined as follows: 


¢ PowerPC user instruction set architecture (UISA)—The UISA defines the level of 
the architecture to which user-level (referred to as problem state in the architecture 
specification) software should conform. The UISA defines the base user-level 
instruction set, user-level registers, data types, floating-point memory conventions 
and exception model as seen by user programs, and the memory and programming 
models. The icon shown in the margin identifies text that is relevant with respect to 
the UISA. 


¢ PowerPC virtual environment architecture (VEA)—The VEA defines additional 
user-level functionality that falls outside typical user-level software requirements. 
The VEA describes the memory model for an environment in which multiple 
devices can access memory, defines aspects of the cache model, defines cache 
control instructions, and defines the time base facility from a user-level perspective. 
The icon shown in the margin identifies text that is relevant with respect to the VEA. 


Implementations that conform to the PowerPC VEA also adhere to the UISA, but 
may not necessarily adhere to the OEA. 


¢ PowerPC operating environment architecture (OEA)—The OEA defines supervisor- 
level (referred to as privileged state in the architecture specification) resources 
typically required by an operating system. The OEA defines the PowerPC memory 
management model, supervisor-level registers, synchronization requirements, and 
the exception model. The OEA also defines the time base feature from a supervisor- 
level perspective. The icon shown in the margin identifies text that is relevant with 
respect to the OEA. 


Implementations that conform to the PowerPC OEA also conform to the PowerPC 
UISA and VEA. 


Implementations that adhere to the VEA level are guaranteed to adhere to the UISA level; 
likewise, implementations that conform to the OEA level are also guaranteed to conform to 
the UISA and the VEA levels. 


All PowerPC devices adhere to the UISA, offering compatibility among all PowerPC 
application programs. However, there may be different versions of the VEA and OEA than 
those described here. For example, some devices, such as embedded controllers, may not 
require some of the features as defined by this VEA and OEA, and may implement a 
simpler or modified version of those features. 
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The general-purpose PowerPC microprocessors developed jointly by Motorola and IBM 
(such as the PowerPC 601™, PowerPC 603™, PowerPC 603e™, PowerPC 604™, 
PowerPC 604e™, and PowerPC 620™ microprocessors) comply both with the UISA and 
with the VEA and OEA discussed here. In this book, these three levels of the architecture 
are referred to collectively as the PowerPC architecture. 


The distinctions between the levels of the PowerPC architecture are maintained clearly 
throughout this document, using the conventions described in the section “Conventions,” 
on page xxxi of the Preface. 


1.1.3 Latitude Within the Levels of the PowerPC Architecture 


The PowerPC architecture defines those parameters necessary to ensure compatibility 
among PowerPC processors, but also allows a wide range of options for individual 
implementations. These are as follows: 


¢ The PowerPC architecture defines some facilities (such as registers, bits within 
registers, instructions, and exceptions) as optional. 


¢ The PowerPC architecture allows implementations to define additional privileged 
special-purpose registers (SPRs), exceptions, and instructions for special system 
requirements (such as power management in processors designed for very low- 
power operation). 


¢ There are many other parameters that the PowerPC architecture allows 
implementations to define. For example, the PowerPC architecture may define 
conditions for which an exception may be taken, such as alignment conditions. A 
particular implementation may choose to solve the alignment problem without 
taking the exception. 


¢ Processors may implement any architectural facility or instruction with assistance 
from software (that is, they may trap and emulate) as long as the results (aside from 
performance) are identical to that specified by the architecture. 


¢ Some parameters are defined at one level of the architecture and defined more 
specifically at another. For example, the UISA defines conditions that may cause an 
alignment exception, and the OEA specifies the exception itself. 


Because of updates to the PowerPC architecture specification, which are described in this 
document, variances may result between existing devices and the revised architecture 
specification. Those variances are included in Implementation Variances Relative to Rev. 1 
of The Programming Environments Manual. 
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1.1.4 Features Not Defined by the PowerPC Architecture 


Because flexibility is an important design goal of the PowerPC architecture, there are many 
aspects of the processor design, typically relating to the hardware implementation, that the 
PowerPC architecture does not define, such as the following: 


e System bus interface signals—Although numerous implementations may have 
similar interfaces, the PowerPC architecture does not define individual signals or the 
bus protocol. For example, the OEA allows each implementation to determine the 
signal or signals that trigger the machine check exception. 


* Cache design—The PowerPC architecture does not define the size, structure, the 
replacement algorithm, or the mechanism used for maintaining cache coherency. 
The PowerPC architecture supports, but does not require, the use of separate 
instruction and data caches. Likewise, the PowerPC architecture does not specify the 
method by which cache coherency is ensured. 


¢ The number and the nature of execution units—The PowerPC architecture is a RISC 
architecture, and as such has been designed to facilitate the design of processors that 
use pipelining and parallel execution units to maximize instruction throughput. 
However, the PowerPC architecture does not define the internal hardware details of 
implementations. For example, one processor may execute load and store operations 
in the integer unit, while another may execute these instructions in a dedicated 
load/store unit. 


¢ Other internal microarchitecture issues—The PowerPC architecture does not 
prescribe which execution unit is responsible for executing a particular instruction; 
it also does not define details regarding the instruction fetching mechanism, how 
instructions are decoded and dispatched, and how results are written back. Dispatch 
and write-back may occur in order or out of order. Also while the architecture 
specifies certain registers, such as the GPRs and FPRs, implementations can 
implement register renaming or other schemes to reduce the impact of data 
dependencies and register contention. 


1.1.5 Summary of Architectural Changes in this Revision 


This revision reflects enhancements to the architecture that have been made since the 
publication of the PowerPC Microprocessor Family: The Programming Environments, 
Rev. 0.1. The primary difference described in this document is the addition of the rfid and 
mtmsrd instructions to the 64-bit portion of the architecture. The rfi and mtmsr 
instructions are now legal in 32-bit processors and illegal in 64-bit processors. Likewise, 
the rfid and mtmsrd are valid instructions only in 64-bit processors and are illegal in 32- 
bit processors. 


In addition, this book reflects smaller changes and clarifications to the PowerPC 
architecture. For more information, see Section 1.3, “Changes in This Revision of The 
Programming Environments Manual.” 
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1.2 The PowerPC Architectural Models 


This section provides overviews of aspects defined by the PowerPC architecture, following 
the same order as the rest of this book. The topics include the following: 

¢ PowerPC registers and programming model 

¢ PowerPC operand conventions 

¢ PowerPC instruction set and addressing modes 

¢ PowerPC cache model 

¢ PowerPC exception model 

¢ PowerPC memory management model 


1.2.1 PowerPC Registers and Programming Model 


The PowerPC architecture defines register-to-register operations for computational 
instructions. Source operands for these instructions are accessed from the architected 
registers or are provided as immediate values embedded in the instruction. The three- 
register instruction format allows specification of a target register distinct from two source 
operand registers. This scheme allows efficient code scheduling in a highly parallel 
processor. Load and store instructions are the only instructions that transfer data between 
registers and memory. The PowerPC registers are shown in Figure 1-1. 


/ SUPERVISOR MODEL—OEA \ 


Configuration Registers 
Machine State Register (MSR) 


USER MODEL—UISA Processor Version Register (PVR) 
32 General-Purpose Registers (GPRs) Memory Management Registers 
32 Floating-Point Registers (FPRs) 8 Instruction BAT Registers (IBATs) 
«fasten. Cometion Reciston GR) 8 Data BAT Registers (DBATs) 
Floating-Point Status and Control Register (FPSCR) SDR1 
XER 16 Segment Registers (SRs)! 
Link Register (LR) . , . 
Count Register (CTR) Exception Handling Registers 
Data Address Register (DAR) 
DSISR 
USER MODEL—VEA Save and Restore Registers (SRRO/SRR1) 


SPRGO-SPRG3 
Floating-Point Exception Cause Register (FPECR) * 


Miscellaneous Registers 
Time Base Facility (TBU and TBL) (For writing) 
Decrementer Register (DEC) 
Data Address Breakpoint Register (DABR) 7 
Processor Identification Register (PIR) 7 
o External Access Register (EAR) * 


Time Base Facility (TBU and TBL) 
(For reading) 











e. 





32-bit implementations only 
2 Optional 


Figure 1-1. Programming Model—PowerPC Registers 
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The programming model incorporates 32 GPRs, 32 FPRs, special-purpose registers 
(SPRs), and several miscellaneous registers. Each implementation may have its own unique 
set of hardware implementation (HID) registers that are not defined by the architecture. 


PowerPC processors have two levels of privilege: 


¢ Supervisor mode—used exclusively by the operating system. Resources defined by 
the OEA can be accessed only supervisor-level software. 


¢ User mode—used by the application software and operating system software (Only 
resources defined by the UISA and VEA can be accessed by user-level software) 


These two levels govern the access to registers, as shown in Figure 1-1. The division of 
privilege allows the operating system to control the application environment (providing 
virtual memory and protecting operating system and critical machine resources). 
Instructions that control the state of the processor, the address translation mechanism, and 
supervisor registers can be executed only when the processor is operating in supervisor 
mode. 


¢ User Instruction Set Architecture Registers—All UISA registers can be accessed 
by all software with either user or supervisor privileges. These registers include the 
32 general-purpose registers (GPRs) and the 32 floating-point registers (FPRs), and 
other registers used for integer, floating-point, and branch instructions. 


¢ Virtual Environment Architecture Registers—The VEA defines the user-level 
portion of the time base facility, which consists of the two 32-bit time base registers. 
These registers can be read by user-level software, but can be written to only by 
supervisor-level software. 


¢ Operating Environment Architecture Registers—SPRs defined by the OEA are 
used for system-level operations such as memory management, exception handling, 
and time-keeping. 


The PowerPC architecture also provides room in the SPR space for implementation- 
specific registers, typically referred to as HID registers. Individual HIDs are not discussed 
in this manual. 


1.2.2 Operand Conventions 


Operand conventions are defined in two levels of the PowerPC architecture—user 
instruction set architecture (UISA) and virtual environment architecture (VEA). These 
conventions define how data is stored in registers and memory. 


1.2.2.1 Byte Ordering 


The default mapping for PowerPC processors is big-endian, but the UISA provides the 
option of operating in either big- or little-endian mode. Big-endian byte ordering is shown 
in Figure 1-2. 
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Figure 1-2. Big-Endian Byte and Bit Ordering 


© The OEA defines two bits in the MSR for specifying byte ordering—LE (little-endian 
mode) and ILE (exception little-endian mode). The LE bit specifies whether the processor 
is configured for big-endian or little-endian mode; the ILE bit specifies the mode when an 
exception is taken by being copied into the LE bit of the MSR. A value of 0 specifies big- 
endian mode and a value of | specifies little-endian mode. 


1.2.2.2 Data Organization in Memory and Data Transfers 


Bytes in memory are numbered consecutively starting with 0. Each number is the address 
of the corresponding byte. 


Memory operands may be bytes, half words, words, or double words, or, for the load/store 
string/multiple instructions, a sequence of bytes or words. The address of a multiple-byte 
memory operand is the address of its first byte (that is, of its lowest-numbered byte). 
Operand length is implicit for each instruction. 


The operand of a single-register memory access instruction has a natural alignment 
boundary equal to the operand length. In other words, the natural address of an operand is 
an integral multiple of the operand length. A memory operand is said to be aligned if it is 
aligned at its natural boundary; otherwise it is misaligned. 


1.2.2.3 Floating-Point Conventions 


u The PowerPC architecture adheres to the IEEE-754 standard for 64- and 32-bit floating- 
point arithmetic: 


¢ Double-precision arithmetic instructions may have single- or double-precision 
operands but always produce double-precision results. 

¢ Single-precision arithmetic instructions require all operands to be single-precision 
values and always produce single-precision results. Single-precision values are 
stored in double-precision format in the FPRs—these values are rounded such that 
they can be represented in 32-bit, single-precision format (as they are in memory). 


1.2.3 PowerPC Instruction Set and Addressing Modes 


All PowerPC instructions are encoded as single-word (32-bit) instructions. Instruction 
formats are consistent among all instruction types, permitting decoding to occur in parallel 
with operand accesses. This fixed instruction length and consistent format greatly simplifies 
instruction pipelining. 
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1.2.3.1 PowerPC Instruction Set 
Although these categories are not defined by the PowerPC architecture, the PowerPC 
instructions can be grouped as follows: 


¢ Integer instructions—These instructions are defined by the UISA. They include U 
computational and logical instructions. 
— Integer arithmetic instructions 
— Integer compare instructions 
— Logical instructions 
— Integer rotate and shift instructions 


¢ Floating-point instructions—These instructions, defined by the UISA, include 
floating-point computational instructions, as well as instructions that manipulate the 
floating-point status and control register (FPSCR). 
— Floating-point arithmetic instructions 
— Floating-point multiply/add instructions 
— Floating-point compare instructions 
— Floating-point status and control instructions 
— Floating-point move instructions 
— Optional floating-point instructions 
¢ Load/store instructions—These instructions, defined by the UISA, include integer 
and floating-point load and store instructions. 
— Integer load and store instructions 
— Integer load and store with byte reverse instructions 
— Integer load and store multiple instructions 
— Integer load and store string instructions 
— Floating-point load and store instructions 
¢ The UISA also provides a set of load/store with reservation instructions (Iwarx and 


stwex.) that can be used as primitives for constructing atomic memory operations. 
These are grouped under synchronization instructions. 


¢ Synchronization instructions—The UISA and VEA define instructions for memory 
synchronizing, especially useful for multiprocessing: 


— Load and store with reservation instructions—These UISA-defined instructions 
provide primitives for synchronization operations such as test and set, compare 
and swap, and compare memory. 


— The Synchronize instruction (sync)—This UISA-defined instruction is useful for 
synchronizing load and store operations on a memory bus that is shared by 
multiple devices. 


— Enforce In-Order Execution of I/O (eieio)— The eieio instruction provides an vy 
ordering function for the effects of load and store operations executed by a 
processor. 
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¢ Flow control instructions—These include branching instructions, condition register 
logical instructions, trap instructions, and other instructions that affect the 
instruction flow. 


U — The UISA defines numerous instructions that control the program flow, 
including branch, trap, and system call instructions as well as instructions that 
read, write, or manipulate bits in the condition register. 

O — The OEA defines two flow control instructions that provide system linkage. 
These instructions are used for entering and returning from supervisor level. 


¢ Processor control instructions—These instructions are used for synchronizing 
memory accesses and managing caches and translation lookaside buffers (TLBs) 
(and segment registers in 32-bit implementations). These instructions include move 
to/from special-purpose register instructions (mtspr and mfspr). 


Vv ¢ Memory/cache control instructions—These instructions provide control of caches, 
0 TLBs, and segment registers. 


— The VEA defines several cache control instructions. 
— The OEA defines one cache control instruction and several memory control 
instructions. 
Vv ¢ External control instructions—The VEA defines two optional instructions for use 
with special input/output devices. 


Note that this grouping of the instructions does not indicate which execution unit executes 
a particular instruction or group of instructions. This is not defined by the PowerPC 
architecture. 


1.2.3.2 Calculating Effective Addresses 


u_ The effective address (EA), also called the logical address, is the address computed by the 
processor when executing a memory access or branch instruction or when fetching the next 
sequential instruction. Unless address translation is disabled, this address is converted by 
the MMU to the appropriate physical address. (Note that the architecture specification uses 
only the term effective address and not logical address.) 


The PowerPC architecture supports the following simple addressing modes for memory 
access instructions: 


¢ EA =(rAl0) (register indirect) 
¢ EA =(rAl0) + offset (including offset = 0) (register indirect with immediate index) 
¢ EA =(rAl0) + rB (register indirect with index) 


These simple addressing modes allow efficient address generation for memory accesses. 
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1.2.4 PowerPC Cache Model 


The VEA and OEA portions of the architecture define aspects of cache implementations for vy 
PowerPC processors. The PowerPC architecture does not define hardware aspects of cache 4 
implementations. For example, some PowerPC processors may have separate instruction 
and data caches (Harvard architecture), while others have a unified cache. 


The PowerPC architecture allows implementations to control the following memory access 
modes on a page or block basis: 


¢ Write-back/write-through mode 

¢ Caching-inhibited mode 

e Memory coherency 

¢ Guarded/not guarded against speculative accesses 


Coherency is maintained on a cache block basis, and cache control instructions perform 
operations on a cache block basis. The size of the cache block is implementation- 
dependent. The term cache block should not be confused with the notion of a block in 
memory, which is described in Section 1.2.6, “PowerPC Memory Management Model.” 


The VEA portion of the PowerPC architecture defines several instructions for cache 
management. These can be used by user-level software to perform such operations as touch 
operations (which cause the cache block to be speculatively loaded), and operations to 
store, flush, or clear the contents of a cache block. The OEA portion of the architecture © 
defines one cache management instruction—the Data Cache Block Invalidate (dcbi) 
instruction. 


1.2.5 PowerPC Exception Model 


The PowerPC exception mechanism, defined by the OEA, allows the processor to change 
to supervisor state as a result of external signals, errors, or unusual conditions arising in the 
execution of instructions. When exceptions occur, information about the state of the 
processor is saved to various registers and the processor begins execution at an address 
(exception vector) predetermined for each type of exception. Exception handler routines 
begin execution in supervisor mode. The PowerPC exception model is described in detail 
in Chapter 6, “Exceptions.” Note also that some aspects regarding exception conditions are 
defined at other levels of the architecture. For example, floating-point exception conditions 
are defined by the UISA, whereas the exception mechanism is defined by the OEA. 


PowerPC architecture requires that exceptions be handled in program order (excluding the 
optional floating-point imprecise modes and the reset and machine check exception); 
therefore, although a particular implementation may recognize exception conditions out of 
order, they are handled strictly in order. When an instruction-caused exception is 
recognized, any unexecuted instructions that appear earlier in the instruction stream, 
including any that have not yet begun to execute, are required to complete before the 
exception is taken. Any exceptions caused by those instructions must be handled first. 
Likewise, exceptions that are asynchronous and precise are recognized when they occur, 
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but are not handled until all instructions currently executing successfully complete 
processing and report their results. 


The OEA supports four types of exceptions: 


¢ Synchronous, precise 

¢ Synchronous, imprecise 

e Asynchronous, maskable 

e Asynchronous, nonmaskable 


1.2.6 PowerPC Memory Management Model 


The PowerPC memory management unit (MMU) specifications are provided by the 
PowerPC OEA. The primary functions of the MMU ina PowerPC processor are to translate 
logical (effective) addresses to physical addresses for memory accesses and I/O accesses 
(most I/O accesses are assumed to be memory-mapped), and to provide access protection 
on a block or page basis. Note that many aspects of memory management are 
implementation-dependent. The description in Chapter7, “Memory Management,” 
describes the conceptual model of a PowerPC MMU; however, PowerPC processors may 
differ in the specific hardware used to implement the MMU model of the OEA. 


PowerPC processors require address translation for two types of transactions—instruction 
accesses and data accesses to memory (typically generated by load and store instructions). 


The memory management specification of the PowerPC OEA includes models for both 64- 
and 32-bit implementations. The MMU of a 32-bit PowerPC processor provides On bytes 
of logical address space accessible to supervisor and user programs with a 4-Kbyte page 
size and 256-Mbyte segment size. 


In 32-bit implementations, the entire 4-Gbyte memory space is defined by sixteen 256- 
Mbyte segments. Segments are configured through the 16 segment registers. In 64-bit 
implementations there are more segments than can be maintained in architecture-defined 
registers, so segment descriptors are maintained in segment table entries (STEs) in memory 
and are accessed through the use of a hashing algorithm much like that used for accessing 
page table entries (PTEs). 


PowerPC processors also have a block address translation (BAT) mechanism for mapping 
large blocks of memory. Block sizes range from 128 Kbyte to 256 Mbyte and are software- 
selectable. In addition, the MMU of 32-bit PowerPC processors uses an interim virtual 
address (52 bits) and hashed page tables in the generation of 32-bit physical addresses. 


Two types of accesses generated by PowerPC processors require address translation: 
instruction accesses, and data accesses to memory generated by load and store instructions. 
The address translation mechanism is defined in terms of segment tables (or segment 
registers in 32-bit implementations) and page tables used by PowerPC processors to locate 
the logical-to-physical address mapping for instruction and data accesses. The segment 
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information translates the logical address to an interim virtual address, and the page table 
information translates the virtual address to a physical address. 


Translation lookaside buffers (TLBs) are commonly implemented in PowerPC processors 
to keep recently-used page table entries on-chip. Although their exact characteristics are not 
specified by the architecture, the general concepts that are pertinent to the system software 
are described. Similarly, 64-bit implementations may contain segment lookaside buffers 
(SLBs) on-chip that contain recently-used segment table entries, but for which the 
PowerPC architecture does not define the exact characteristics. 


The block address translation (BAT) mechanism is a software-controlled array that stores 
the available block address translations on-chip. BAT array entries are implemented as pairs 
of BAT registers that are accessible as supervisor special-purpose registers (SPRs); refer to 
Chapter 7, “Memory Management,” for more information. 


1.3 Changes in This Revision of The Programming 
Environments Manual 


This book reflects changes made to the PowerPC architecture after the publication of Rev. 0 
of The Programming Environments Manual and before Dec. 13, 1994 (Rev. 0.1). In 
addition, it reflects changes made to the architecture after the publication of Rev. 0.1 of The 
Programming Environments Manual and before Aug. 6, 1996 (Rev. 1). Although there are 
many changes in this revision, this section summarizes only the most significant changes 
and clarifications to the architecture specification. 


The main substantive change from Rev. 0 to Rev. 1 for 32-bit processors is the phasing out 
of the direct-store facility. This facility defined segments that were used to generate direct- 
store interface accesses on the external bus to communicate with specialized I/O devices; it 
was not optimized for performance in the PowerPC architecture and was present for 
compatibility with older devices only. As of this revision of the architecture (Rev. 1), direct- 
store segments are an optional processor feature. However, they are not likely to be 
supported in future implementations and new software should not use them. 


Table 1-1 and Table 1-2 list changes made to the UISA that are reflected in this book and 
identify the chapters affected by those changes. Note that many of the changes made in the 
UISA are reflected in both the VEA and OEA portions of the architecture as well. 
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Table 1-1. UISA Changes—Rev. 0 to Rev. 0.1 


reeernnaimcerenieresmwearcemee LE 


arified intermediate result with respect to floating-point operations (the intermediate 
result has infinite precision and unbounded exponent range). 


arified the definition of rounding such that rounding always occurs (specifically, FR and 
flags are always affected) for arithmetic, rounding, and conversion instructions. 





arified the definition of the term ‘tiny’ (detected before rounding). 


In D.3.2, “Conversion from Floating-Point Number to Unsigned Fixed-Point Integer Word,” 
changed value in FPR 3 from 282 to 282 _ 4 (in 32-bit implementation description). 


Noted additional POWER incompatibility for Store Floating-Point Single (stfs) instruction. 





Table 1-2. UISA Changes—Rev. 0.1 to Rev. 1.0 
Although the stfiwx instruction is an optional instruction, it will likely be required for future | 4,8,A 
processors. 


Added the new Data Cache Block Allocate (deba) instruction. 4,5,8,A 
Deleted some warnings about generating misaligned little-endian access. is 


Table 1-3 and Table 1-4 list changes made to the VEA that are reflected in this book and the 
chapters that are affected by those changes. Note that some changes to the UISA are 
reflected in the VEA and in turn, some changes to the VEA affect the OEA as well. 





Table 1-3. VEA Changes—Rev. 0 to Rev. 0.1 


Table 1-4. VEA Changes—Rev. 0.1 to Rev. 1.0 


Added the requirement that caching-inhibited guarded store operations are ordered. eT 





Clarified use of the debf instruction in keeping instruction cache coherency in the case of a 
combined instruction/data cache in a multiprocessor system. 
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Table 1-5 and Table 1-6 list changes made to the OEA that are reflected in this book and the 
chapters that are affected by those changes. Note that some changes to the UISA and VEA 
are reflected in the OEA as well. 


Table 1-5. OEA Changes—Rev. 0 to Rev. 0.1 
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Table 1-6. OEA Changes—Rev. 0.1 to Rev. 1.0 


Changed definition of direct-store segments to an optional processor feature that is not 2,6,7 
likely to be supported in future implementations and new software should not use it. 


Changed the ranges of bits saved from MSR to SRR1 (and restored from SRR1 to MSRon | 2,6 
rfi) on an exception. 


Clarified the definition of execution synchronization. Also clarified that the mtmsr and 
mtmsrd instructions are not execution synchronizing. 





Clarified the use of memory allocated for predefined uses (including the exception 
vectors). 

Revised the page table update synchronization requirements and recommended code 
sequences. 
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Chapter 2 
PowerPC Register Set 


This chapter describes the register organization defined by the three levels of the PowerPC vy 
architecture—user instruction set architecture (UISA), virtual environment architecture Vv 
(VEA), and operating environment architecture (OEA). The PowerPC architecture defines 
register-to-register operations for all computational instructions. Source data for these 
instructions are accessed from the on-chip registers or are provided as immediate values 
embedded in the opcode. The three-register instruction format allows specification of a 
target register distinct from the two source registers, thus preserving the original data for 
use by other instructions and reducing the number of instructions required for certain 
operations. Data is transferred between memory and registers with explicit load and store 
instructions only. 


O 


Note that the handling of reserved bits in any register is implementation-dependent. 
Software is permitted to write any value to a reserved bit in a register. However, a 
subsequent reading of the reserved bit returns 0 if the value last written to the bit was 0 and 
returns an undefined value (may be 0 or 1) otherwise. This means that even if the last value 
written to a reserved bit was 1, reading that bit may return 0. 


2.1 PowerPC UISA Register Set 


The PowerPC UISA registers, shown in Figure 2-1, can be accessed by either user- or 
supervisor-level instructions (the architecture specification refers to user-level and 
supervisor-level as problem state and privileged state respectively). The general-purpose 
registers (GPRs) and floating-point registers (FPRs) are accessed as instruction operands. 
Access to registers can be explicit (that is, through the use of specific instructions for that 
purpose such as Move to Special-Purpose Register (mtspr) and Move from Special- 
Purpose Register (mfspr) instructions) or implicit as part of the execution of an instruction. 
Some registers are accessed both explicitly and implicitly. 


The number to the right of the register names indicates the number that is used in the syntax 
of the instruction operands to access the register (for example, the number used to access 
the XER is SPR 1). 


Note that the general-purpose registers (GPRs), link register (LR), and count register (CTR) 
are 64 bits wide on 64-bit implementations and 32 bits wide on 32-bit implementations. 
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Figure 2-1. UISA Programming Model—User-Level Registers 
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The user-level registers can be accessed by all software with either user or supervisor 
privileges. The user-level register set includes the following: 


General-purpose registers (GPRs). The general-purpose register file consists of 32 
GPRs designated as GPRO-GPR31. The GPRs serve as data source or destination 
registers for all integer instructions and provide data for generating addresses. See 
Section 2.1.1, “General-Purpose Registers (GPRs),” for more information. 


Floating-point registers (FPRs). The floating-point register file consists of 32 FPRs 
designated as FPRO-FPR31; these registers serve as the data source or destination 
for all floating-point instructions. While the floating-point model includes data 
objects of either single- or double-precision floating-point format, the FPRs only 
contain data in double-precision format. For more information, see Section 2.1.2, 
“Floating-Point Registers (FPRs).” 


Condition register (CR). The CR is a 32-bit register, divided into eight 4-bit fields, 
CRO-CR7, that reflects the results of certain arithmetic operations and provides a 
mechanism for testing and branching. For more information, see Section 2.1.3, 
“Condition Register (CR).” 


Floating-point status and control register (FPSCR). The FPSCR contains all 
floating-point exception signal bits, exception summary bits, exception enable bits, 
and rounding control bits needed for compliance with the IEEE 754 standard. For 
more information, see Section 2.1.4, “Floating-Point Status and Control Register 
(FPSCR).” (Note that the architecture specification refers to exceptions as 
interrupts.) 


XER register (XER). The XER indicates overflows and carry conditions for integer 
operations and the number of bytes to be transferred by the load/store string indexed 
instructions. For more information, see Section 2.1.5, “XER Register (XER).” 


Link register (LR). The LR provides the branch target address for the Branch 
Conditional to Link Register (belrx) instructions, and can optionally be used to hold 
the effective address of the instruction that follows a branch with link update 
instruction in the instruction stream, typically used for loading the return pointer for 
a subroutine. For more information, see Section 2.1.6, “Link Register (LR).” 


Count register (CTR). The CTR holds a loop count that can be decremented during 
execution of appropriately coded branch instructions. The CTR can also provide the 
branch target address for the Branch Conditional to Count Register (bectrx) 
instructions. For more information, see Section 2.1.7, “Count Register (CTR).” 


2.1.1 General-Purpose Registers (GPRs) 


Integer data is manipulated in the processor’s 32 GPRs shown in Figure 2-2. These registers 
are 64-bit registers in 64-bit implementations and 32-bit registers in 32-bit 
implementations. The GPRs are accessed as source and destination registers in the 
instruction syntax. 
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Figure 2-2. General-Purpose Registers (GPRs) 


2.1.2 Floating-Point Registers (FPRs) 


The PowerPC architecture provides thirty-two 64-bit FPRs as shown in Figure 2-3. These 
registers are accessed as source and destination registers for floating-point instructions. 
Each FPR supports the double-precision floating-point format. Every instruction that 
interprets the contents of an FPR as a floating-point value uses the double-precision 
floating-point format for this interpretation. Note that FPRs are 64 bits on both 64-bit and 
32-bit processor implementations. 


All floating-point arithmetic instructions operate on data located in FPRs and, with the 
exception of compare instructions, place the result into an FPR. Information about the 
status of floating-point operations is placed into the FPSCR and in some cases, into the CR 
after the completion of instruction execution. For information on how the CR is affected for 
floating-point operations, see Section 2.1.3, “Condition Register (CR).” 


Load and store double-word instructions transfer 64 bits of data between memory and the 
FPRs with no conversion. Load single instructions are provided to read a single-precision 
floating-point value from memory, convert it to double-precision floating-point format, and 
place it in the target floating-point register. Store single-precision instructions are provided 
to read a double-precision floating-point value from a floating-point register, convert it to 
single-precision floating-point format, and place it in the target memory location. 


Single- and double-precision arithmetic instructions accept values from the FPRs in 
double-precision format. For single-precision arithmetic and store instructions, all input 
values must be representable in single-precision format; otherwise, the result placed into 
the target FPR (or the memory location) and the setting of status bits in the FPSCR and in 
the condition register (if the instruction’s record bit, Rc, is set) are undefined. 


The floating-point arithmetic instructions produce intermediate results that may be 
regarded as infinitely precise and with unbounded exponent range. This intermediate result 
is normalized or denormalized if required, and then rounded to the destination format. The 
final result is then placed into the target FPR in the double-precision format or in fixed-point 
format, depending on the instruction. Refer to Section 3.3, “Floating-Point Execution 
Models—UISA,” for more information. 
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Figure 2-3. Floating-Point Registers (FPRs) 


2.1.3 Condition Register (CR) 


The condition register (CR) is a 32-bit register that reflects the result of certain operations 
and provides a mechanism for testing and branching. The bits in the CR are grouped into 
eight 4-bit fields, CRO-CR7, as shown in Figure 2-4. 





CRO CR1 CR2 CR3 CR4 CR5 CR6 CR7 
0 34 7 8 11 12 15 16 19 20 23 24 27 28 31 























Figure 2-4. Condition Register (CR) 


The CR fields can be set in one of the following ways: 
¢ Specified fields of the CR can be set from a GPR by using the mterf instruction. 


¢ The contents of XER[0—3] can be moved to another CR field by using the merf 
instruction. 


¢ A specified field of the XER can be copied to a specified field of the CR by using the 
mcrxr instruction. 


¢ A specified field of the FPSCR can be copied to a specified field of the CR by using 
the merfs instruction. 


¢ Condition register logical instructions can be used to perform logical operations on 
specified bits in the condition register. 


¢ CRO can be the implicit result of an integer instruction. 

¢ CRI can be the implicit result of a floating-point instruction. 

¢ A specified CR field can indicate the result of either an integer or floating-point 
compare instruction. 


Note that branch instructions are provided to test individual CR bits. 


Chapter 2. PowerPC Register Set 2-5 


2.1.3.1 Condition Register CRO Field Definition 

For all integer instructions, when the CR is set to reflect the result of the operation (that is, 
when Rc = 1), and for addic., andi., and andis., the first three bits of CRO are set by an 
algebraic comparison of the result to zero; the fourth bit of CRO is copied from XER[SO]. 
For integer instructions, CR bits 0-3 are set to reflect the result as a signed quantity. 


The CR bits are interpreted as shown in Table 2-1. If any portion of the result is undefined, 
the value placed into the first three bits of CRO is undefined. 


Table 2-1. Bit Settings for CRO Field of CR 


CRO on 


Negative (LT)—This bit is set when the result is negative. 


Positive (GT)—This bit is set when the result is positive (and not 
zero). 


Zero (EQ)—This bit is set when the result is zero. 


Summary overflow (SO)—This is a copy of the final state of XER[SO] 
at the completion of the instruction. 





Note that CRO may not reflect the true (that is, infinitely precise) result if overflow occurs. 


2.1.3.2 Condition Register CR1 Field Definition 

In all floating-point instructions when the CR is set to reflect the result of the operation (that 
is, when the instruction’s record bit, Rc, is set), CR1 (bits 4-7 of the CR) is copied from 
bits 0-3 of the FPSCR and indicates the floating-point exception status. For more 
information about the FPSCR, see Section 2.1.4, “Floating-Point Status and Control 
Register (FPSCR).” The bit settings for the CR1 field are shown in Table 2-2. 


Table 2-2. Bit Settings for CR1 Field of CR 


CRI 


Floating-point exception (FX)—This is a copy of the final state of 
FPSCRI[FX] at the completion of the instruction. 


Floating-point invalid exception (VX)—This is a copy of the final state 
of FPSCR[VX] at the completion of the instruction. 


Floating-point overflow exception (OX)—This is a copy of the final 
state of FRSCR[OX] at the completion of the instruction. 


Floating-point enabled exception (FEX)—This is a copy of the final 
state of FPSCR[FEX] at the completion of the instruction. 
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2.1.3.3 Condition Register CRn Field—Compare Instruction 


For a compare instruction, when a specified CR field is set to reflect the result of the 
comparison, the bits of the specified field are interpreted as shown in Table 2-3. 


Table 2-3. CRn Field Bit Settings for Compare Instructions 


Less than or floating-point less than (LT, FL). 

For integer compare instructions: rA < SIMM or rB (signed comparison) or 
rA < UIMM or rB (unsigned comparison). 

For floating-point compare instructions: frA < frB. 


Greater than or floating-point greater than (GT, FG). 

For integer compare instructions: rA > SIMM or rB (signed comparison) or 
rA > UIMM or rB (unsigned comparison). 

For floating-point compare instructions: frA > frB. 


Equal or floating-point equal (EQ, FE). 
For integer compare instructions: rA = SIMM, UIMM, or rB. 
For floating-point compare instructions: frA = frB. 


Summary overflow or floating-point unordered (SO, FU). 

For integer compare instructions: This is a copy of the final state of XER[SO] 
at the completion of the instruction. 

For floating-point compare instructions: One or both of frA and frB is a Nota 
Number (NaN). 





Notes:'Here, the bit indicates the bit number in any one of the 4-bit subfields, CRO-CR7. 
For a complete description of instruction syntax conventions, refer to Table 8-2 on 
page 8-2. 


2.1.4 Floating-Point Status and Control Register (FPSCR) 
The FPSCR, shown in Figure 2-5, contains bits that do the following: 


* Record exceptions generated by floating-point operations 

* Record the type of the result produced by a floating-point operation 

¢ Control the rounding mode used by floating-point operations 

¢ Enable or disable the reporting of exceptions (invoking the exception handler) 


Bits 0-23 are status bits. Bits 24-31 are control bits. Status bits in the FPSCR are updated 
at the completion of the instruction execution. 


Except for the floating-point enabled exception summary (FEX) and floating-point invalid 
operation exception summary (VX), the exception condition bits in the FPSCR (bits 0-12 
and 21-23) are sticky. Once set, sticky bits remain set until they are cleared by an merfs, 
mtfsfi, mtfsf, or mtfsb0 instruction. 


FEX and VX are the logical ORs of other FPSCR bits. Therefore, these two bits are not 
listed among the FPSCR bits directly affected by the various instructions. 
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[_] Reserved 
VXIDI VXZDZ VXSOFT 


VXISI VXSQRT 
VXSNAN 












012 3 4 5 6 7 8 9 10 11 12 13 14 15 19 20 21 22 23 24 25 26 27 28 29 30 31 


Figure 2-5. Floating-Point Status and Control Register (FPSCR) 
A listing of FPSCR bit settings is shown in Table 2-4. 
Table 2-4. FPSCR Bit Settings 


Floating-point exception summary. Every floating-point instruction, except mtfsfi and mtfsf, 
implicitly sets FPSCR[FX] if that instruction causes any of the floating-point exception bits in 
the FPSCR to transition from 0 to 1. The merfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 
instructions can alter FRSCR[FX] explicitly. This is a sticky bit. 


Floating-point enabled exception summary. This bit signals the occurrence of any of the 
enabled exception conditions. It is the logical OR of all the floating-point exception bits 
masked by their respective enable bits (FEX = (VX & VE) * (OX & OE) * (UX & UE) * (ZX & 
ZE) * (XX & XE)). The merfs, mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions cannot alter 
FPSCR[FEX] explicitly. This is not a sticky bit. 


Floating-point invalid operation exception summary. This bit signals the occurrence of any 
invalid operation exception. It is the logical OR of all of the invalid operation exceptions. The 
merfs, mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions cannot alter FPSCR[VX] explicitly. This 
is not a sticky bit. 


Floating-point overflow exception. This is a sticky bit. See Section 3.3.6.2, “Overflow, 
Underflow, and Inexact Exception Conditions.” 


Floating-point underflow exception. This is a sticky bit. See Section 3.3.6.2.2, “Underflow 
Exception Condition.” 


Floating-point zero divide exception. This is a sticky bit. See Section 3.3.6.1.2, “Zero Divide 
Exception Condition.” 


Floating-point inexact exception. This is a sticky bit. See Section 3.3.6.2.3, “Inexact Exception 
Condition.” 
FPSCR[XX] is the sticky version of FPSCR[Fl]. The following rules describe how FPSCR[XX] 
is set by a given instruction: 
+ If the instruction affects FPSCR[FI], the new value of FPSCR[XX] is obtained by logically 
ORing the old value of FPSCR[XX] with the new value of FPSCRI[FI]. 
+ If the instruction does not affect FPSCR[Fl], the value of FRSCR[XX] is unchanged. 


7 VXSNAN | Floating-point invalid operation exception for SNaN. This is a sticky bit. See Section 3.3.6.1.1, 
“Invalid Operation Exception Condition.” 
VXISI Floating-point invalid operation exception for co — 0. This is a sticky bit. See Section 3.3.6.1.1, 
“Invalid Operation Exception Condition.” 
VXIDI Floating-point invalid operation exception for oo + 0. This is a sticky bit. See Section 3.3.6.1.1, 
“Invalid Operation Exception Condition.” 
10 VXZDZ Floating-point invalid operation exception for 0 + 0. This is a sticky bit. See Section 3.3.6.1.1, 
“Invalid Operation Exception Condition.” 
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Table 2-4. FPSCR Bit Settings (Continued) 
a 
11 VXIMZ Floating-point invalid operation exception for < * 0. This is a sticky bit. See Section 3.3.6.1.1, 
“Invalid Operation Exception Condition.” 


Floating-point invalid operation exception for invalid compare. This is a sticky bit. See 
Section 3.3.6.1.1, “Invalid Operation Exception Condition.” 


Floating-point fraction rounded. The last arithmetic or rounding and conversion instruction that 
rounded the intermediate result incremented the fraction. See Section 3.3.5, “Rounding.” This 
bit is not sticky. 


Floating-point fraction inexact. The last arithmetic or rounding and conversion instruction 
either rounded the intermediate result (producing an inexact fraction) or caused a disabled 
overflow exception. See Section 3.3.5, “Rounding.” This is not a sticky bit. For more 
information regarding the relationship between FPSCR[FI] and FPSCR[XX], see the 
description of the FPSCR[XX] bit. 


15-19 | FPRF Floating-point result flags. For arithmetic, rounding, and conversion instructions, the field is 
based on the result placed into the target register, except that if any portion of the result is 
undefined, the value placed here is undefined. 

15 Floating-point result class descriptor (C). Arithmetic, rounding, and conversion 
instructions may set this bit with the FPCC bits to indicate the class of the result as 
shown in Table 2-5. 

16-19 Floating-point condition code (FPCC). Floating-point compare instructions always 
set one of the FPCC bits to one and the other three FPCC bits to zero. Arithmetic, 
rounding, and conversion instructions may set the FPCC bits with the C bit to 
indicate the class of the result. Note that in this case the high-order three bits of the 
FPCC retain their relational significance indicating that the value is less than, 
greater than, or equal to zero. 

16 Floating-point less than or negative (FL or <) 
17 Floating-point greater than or positive (FG or >) 
18 Floating-point equal or zero (FE or =) 

19 Floating-point unordered or NaN (FU or ?) 

Note that these are not sticky bits. 


po [= frm OCSCSCSCSSCSC*dY 

21 VXSOFT | Floating-point invalid operation exception for software request. This is a sticky bit. This bit can 
be altered only by the merfs, mtfsfi, mtfsf, mtfsb0, or mtfsb1 instructions. For more detailed 
information, refer to Section 3.3.6.1.1, “Invalid Operation Exception Condition.” 

22 VXSQRT | Floating-point invalid operation exception for invalid square root. This is a sticky bit. For more 
detailed information, refer to Section 3.3.6.1.1, “Invalid Operation Exception Condition.” 
Floating-point invalid operation exception for invalid integer convert. This is a sticky bit. See 


Section 3.3.6.1.1, “Invalid Operation Exception Condition.” 


Floating-point invalid operation exception enable. See Section 3.3.6.1.1, “Invalid Operation 
Exception Condition.” 


IEEE floating-point overflow exception enable. See Section 3.3.6.2, “Overflow, Underflow, and 
Inexact Exception Conditions.” 


IEEE floating-point underflow exception enable. See Section 3.3.6.2.2, “Underflow Exception 
Condition.” 


IEEE floating-point zero divide exception enable. See Section 3.3.6.1.2, “Zero Divide 
Exception Condition.” 


Floating-point inexact exception enable. See Section 3.3.6.2.3, “Inexact Exception Condition.” 
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Table 2-4. FPSCR Bit Settings (Continued) 


Floating-point non-IEEE mode. If this bit is set, results need not conform with IEEE standards 
and the other FPSCR bits may have meanings other than those described here. If the bit is set 
and if all implementation-specific requirements are met and if an IEEE-conforming result of a 
floating-point operation would be a denormalized number, the result produced is zero 

(retaining the sign of the denormalized number). Any other effects associated with setting this 


bit are described in the user’s manual for the implementation (the effects are implementation- 
dependent). 


Floating-point rounding control. See Section 3.3.5, “Rounding.” 
Round to nearest 
Round toward zero 
Round toward +infinity 
Round toward —infinity 





Table 2-5 illustrates the floating-point result flags used by PowerPC processors. The result 
flags correspond to FPSCR bits 15-19. 


Table 2-5. Floating-Point Result Flags in FPSCR 


Result Flags (Bits 15-19) 


Result Value Class 
peel a eos (| 
a 
En 
Pe [2 [2 |e iemateesnumber | 


ef oOo [Dor coeremelendinsay + 
ae Oa Pelee 
Pos] Ou Pe te Reale = 
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aa 
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2.1.5 XER Register (XER) 
The XER register (XER) is a 32-bit, user-level register shown in Figure 2-6. 





[_] Reserved 
0 0000 0000 0000 0000 0000 0 Byte count 
01 2 3 24 25 31 


Figure 2-6. XER Register 


The bit definitions for XER, shown in Table 2-6, are based on the operation of an 
instruction considered as a whole, not on intermediate results. For example, the result of the 
Subtract from Carrying (subfex) instruction is specified as the sum of three values. This 
instruction sets bits in the XER based on the entire operation, not on an intermediate sum. 


Table 2-6. XER Bit Definitions 


2) 


Summary overflow. The summary overflow bit (SO) is set whenever an instruction (except mtspr) 
sets the overflow bit (OV). Once set, the SO bit remains set until it is cleared by an mtspr 
instruction (specifying the XER) or an merxr instruction. It is not altered by compare instructions, 
nor by other instructions (except mtspr to the XER, and merxr) that cannot overflow. Executing 


an mtspr instruction to the XER, supplying the values zero for SO and one for OV, causes SO to 
be cleared and OV to be set. 


Overflow. The overflow bit (OV) is set to indicate that an overflow has occurred during execution 
of an instruction. Add, subtract from, and negate instructions having OE = 1 set the OV bit if the 
carry out of the msb is not equal to the carry out of the msb + 1, and clear it otherwise. Multiply 
low and divide instructions having OE = 1 set the OV bit if the result cannot be represented in 64 


bits (mulld, divd, divdu) or in 32 bits (mullw, divw, divwu), and clear it otherwise. The OV bit is 
not altered by compare instructions that cannot overflow (except mtspr to the XER, and merxr). 


2 CA Carry. The carry bit (CA) is set during execution of the following instructions: 
+ Add carrying, subtract from carrying, add extended, and subtract from extended instructions 
set CA if there is a carry out of the msb, and clear it otherwise. 
+ Shift right algebraic instructions set CA if any 1 bits have been shifted out of a negative 
operand, and clear it otherwise. 
The CA bit is not altered by compare instructions, nor by other instructions that cannot carry 


(except shift right algebraic, mtspr to the XER, and merxr). 


a 


25-31 This field specifies the number of bytes to be transferred by a Load String Word Indexed (Iswx) or 
Store String Word Indexed (stswx) instruction. 


2.1.6 Link Register (LR) 


The link register (LR) is a 64-bit register in 64-bit implementations and a 32-bit register in 
32-bit implementations. The LR supplies the branch target address for the Branch 
Conditional to Link Register (belrx) instructions, and in the case of a branch with link 
update instruction, can be used to hold the logical address of the instruction that follows the 
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branch with link update instruction (for returning from a subroutine). The format of LR is 
shown in Figure 2-7. 


Branch Address 


0 63 
Figure 2-7. Link Register (LR) 





Note that although the two least-significant bits can accept any values written to them, they 
are ignored when the LR is used as an address. Both conditional and unconditional branch 
instructions include the option of placing the logical address of the instruction following 
the branch instruction in the LR. 


The link register can be also accessed by the mtspr and mfspr instructions using SPR 8. 
Prefetching instructions along the target path (loaded by an mtspr instruction) is possible 
provided the link register is loaded sufficiently ahead of the branch instruction (so that any 
branch prediction hardware can calculate the branch address). Additionally, PowerPC 
processors can prefetch along a target path loaded by a branch and link instruction. 


Note that some PowerPC processors may keep a stack of the LR values most recently set 
by branch with link update instructions. To benefit from these enhancements, use of the link 
register should be restricted to the manner described in Section 4.2.4.2, “Conditional 
Branch Control.” 


2.1.7 Count Register (CTR) 


The count register (CTR) is a 64-bit register in 64-bit implementations and a 32-bit register 
in 32-bit implementations. The CTR can hold a loop count that can be decremented during 
execution of branch instructions that contain an appropriately coded BO field. If the value 
in CTR is 0 before being decremented, it is OxFFFF_FFFF (271) afterward. The CTR can 
also provide the branch target address for the Branch Conditional to Count Register 
(bectrx) instruction. The CTR is shown in Figure 2-8. 


CTR 


Figure 2-8. Count Register (CTR) 


Prefetching instructions along the target path is also possible provided the count register is 
loaded sufficiently ahead of the branch instruction (so that any branch prediction hardware 
can calculate the correct value of the loop count). 


The count register can also be accessed by the mtspr and mfspr instructions by specifying 
SPR 9. In branch conditional instructions, the BO field specifies the conditions under which 
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the branch is taken. The first four bits of the BO field specify how the branch is affected by 
or affects the CR and the CTR. The encoding for the BO field is shown in Table 2-7. 


Table 2-7. BO Operand Encodings 


ee ee 


Notes: The y bit provides a hint about whether a conditional branch is likely to be taken and is used by 
some PowerPC implementations to improve performance. Other implementations may ignore the 
y bit. 
The zindicates a bit that is ignored. The z bits should be cleared (zero), as they may be assigned 
a meaning in a future version of the PowerPC UISA. 





2.2 PowerPC VEA Register Set—Time Base 


The PowerPC virtual environment architecture (VEA) defines registers in addition to those V 
defined by the UISA. The PowerPC VEA register set can be accessed by all software with 
either user- or supervisor-level privileges. Figure 2-9 provides a graphic illustration of the 
PowerPC VEA register set. Note that the following programming model is similar to that 
found in Figure 2-1, however, the PowerPC VEA registers are now included. 


The PowerPC VEA introduces the time base facility (TB), a 64-bit structure that consists 
of two 32-bit registers—time base upper (TBU) and time base lower (TBL). Note that the 
time base registers can be accessed by both user- and supervisor-level instructions. In the 
context of the VEA, user-level applications are permitted read-only access to the TB. The 
OEA defines supervisor-level access to the TB for writing values to the TB. See 
Section 2.3.12, “Time Base Facility (TB)—-OEA,” for more information. 


In Figure 2-9, the numbers to the right of the register name indicates the number that is used 
in the syntax of the instruction operands to access the register (for example, the number 
used to access the XER is SPR 1). 


Note that the general-purpose registers (GPRs), link register (LR), and count register (CTR) 
are 64 bits on 64-bit implementations and 32 bits on 32-bit implementations. These 
registers are described fully in Section 2.1, “PowerPC UISA Register Set.” 
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i SUPERVISOR MODEL \ 





UISA 


General-Purpose Registers 


GPRO (64/32) 
GPR1 (64/32) 


GPR31 (64/32) 





Floating-Point Registers 


FPR31 (64) 


Condition Register ' 
CR (32) 





Floating-Point Status 
and Control Register ' 


FPSCR (32) 
XER Register ' 


XER (32) SPR 1 


Link Register 
LR (64/32) SPR 8 





Count Register 


\L CTR (64/32) SPR 9 ) 


USER MODEL 
VEA 


Time Base Facility ' 
(For Reading) 


TBL (32) TBR 2684 
TBU (32) TBR 269 





























' These registers are 32-bit registers only. 


{ USER MODEL | 








Sy 


OEA 


Configuration Registers 


Machine State Register 
MSR (64/32) 


Processor Version Register ' 
PVR (32) SPR 287 


Memory Management Registers 


Instruction BAT Registers 


IBATOU (64/32) 


IBATOL (64/32) 


IBAT1U (64/32 





SDAI1 (64/32) | SPR 25 


Address Space Register ° 


ASR (64) | SPR 280 


Data BAT Registers 
DBATOU (64/32) | SPR 536 
DBATOL (64/32) | SPR 537 
DBAT1U (64/32) | SPR 538 
DBAT1L (64/32) | SPR 539 
DBAT2U (64/32) | SPR 540 
DBAT2L (64/32) | SPR 541 
DBAT3U (64/32) | SPR 542 
DBATSL (64/32) | SPR 543 





Segment Registers ':? 
SRO (32) 
SRI (32) 





$R15 (32) 


Exception Handling Registers 


Data Address Register 


DAR (64/32) | SPR 19 


SPRGs 

SPRGO (64/32 
SPRG1 (64/32 
SPRG2 (64/32 
SPRG3 (64/32 


SPR 272 
SPR 273 
SPR 274 
SPR 275 








DSISR ' 
DSISR (32) | SPR18 


Save and Restore Registers 
SRRO (64/32) | SPR 26 
SRR1 (64/32) | SPR 27 


Floating-Point Exception 
Cause Register (Optional) 


FPECR SPR 1022 





Miscellaneous Registers 


Time Base Facility ' 
(For Writing) 


TBL (32) SPR 284 
TBU (32) | SPR 285 


Decrementer ' 
DEC (32) SPR 22 


Processor Identification 
Register (Optional) 


? These registers are on 32-bit implementations only. 
3 These registers are on 64-bit implementations only. 
4 In 64-bit implementations, TBR268 is read as a 64-bit value. 


Data Address 
Breakpoint Register 
(Optional) 


DABR (64/32) | SPR 1013 
External Access Register 
(Optional) ' 

EAR (32) SPR 282 


Figure 2-9. VEA Programming Model—User-Level Registers Plus Time Base 
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The time base (TB), shown in Figure 2-10, is a 64-bit structure that contains a 64-bit 
unsigned integer that is incremented periodically. Each increment adds | to the low-order 
bit (bit 31 of TBL). The frequency at which the counter is incremented is implementation- 
dependent. 





TBU—Upper 32 bits of time base TBL—Lower 32 bits of time base 
0 31 0 31 


Figure 2-10. Time Base (TB) 





The TB increments until its value becomes OxFFFF_FFFF_FFFF_FFFF (0% — 1). At the 
next increment its value becomes 0x0000_0000_0000_0000. Note that there is no explicit 
indication that this has occurred (that is, no exception is generated). 


The period of the time base depends on the driving frequency. The TB is implemented such 
that the following requirements are satisfied: 


1. Loading a GPR from the time base has no effect on the accuracy of the time base. 


2. Storing a GPR to the time base replaces the value in the time base with the value in 
the GPR. 


The PowerPC VEA does not specify a relationship between the frequency at which the time 
base is updated and other frequencies, such as the processor clock. The TB update 
frequency is not required to be constant; however, for the system software to maintain time 
of day and operate interval timers, one of two things is required: 


¢ The system provides an implementation-dependent exception to software whenever 
the update frequency of the time base changes and a means to determine the current 
update frequency; or 


¢ The system software controls the update frequency of the time base. 


Note that if the operating system initializes the TB to some reasonable value and the update 
frequency of the TB is constant, the TB can be used as a source of values that increase at a 
constant rate, such as for time stamps in trace entries. 


Even if the update frequency is not constant, values read from the TB are monotonically 
increasing (except when the TB wraps from pr Tis 0). If a trace entry is recorded each 
time the update frequency changes, the sequence of TB values can be postprocessed to 
become actual time values. 


However, successive readings of the time base may return identical values due to 
implementation-dependent factors such as a low update frequency or initialization. 
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2.2.1 Reading the Time Base 


The mftb instruction is used to read the time base. For specific details on using the mftb 
instruction, see Chapter 8, “Instruction Set.” For information on writing the time base, see 
Section 2.3.12.1, “Writing to the Time Base.” 


On 32-bit implementations, it is not possible to read the entire 64-bit time base in a single 
instruction. The mftb simplified mnemonic moves from the lower half of the time base 
register (TBL) to a GPR, and the mftbu simplified mnemonic moves from the upper half 
of the time base (TBU) to a GPR. 


Because of the possibility of a carry from TBL to TBU occurring between reads of the TBL 
and TBU, a sequence such as the following example is necessary to read the time base on 
32-bit implementations: 


loop: 
mftbu rx #load from TBU 
mftb ry #load from TBL 
mftbu rz #load from TBU 
cmpw LzZ,LX #see if ‘old’ = ‘new’ 
bne loop #loop if carry occurred 


The comparison and loop are necessary to ensure that a consistent pair of values has been 
obtained. The previous example will also work on 64-bit implementations running in either 
64-bit or 32-bit mode. 


2.2.2 Computing Time of Day from the Time Base 


Since the update frequency of the time base is system-dependent, the algorithm for 
converting the current value in the time base to time of day is also system-dependent. 


In a system in which the update frequency of the time base may change over time, it is not 
possible to convert an isolated time base value into time of day. Instead, a time base value 
has meaning only with respect to the current update frequency and the time of day that the 
update frequency was last changed. Each time the update frequency changes, either the 
system software is notified of the change via an exception, or else the change was instigated 
by the system software itself. At each such change, the system software must compute the 
current time of day using the old update frequency, compute a new value of ticks-per- 
second for the new frequency, and save the time of day, time base value, and tick rate. 
Subsequent calls to compute time of day use the current time base value and the saved data. 


A generalized service to compute time of day could take the following as input: 
¢ Time of day at beginning of current epoch 
¢ Time base value at beginning of current epoch 
¢ Time base update frequency 
¢ Time base value for which time of day is desired 
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For a PowerPC system in which the time base update frequency does not vary, the first three 
inputs would be constant. 


2.3 PowerPC OEA Register Set 


The PowerPC operating environment architecture (OEA) completes the discussion of 6 
PowerPC registers. Figure 2-11 shows a graphic representation of the entire PowerPC 
register set-—UISA, VEA, and OEA. In Figure 2-11 the numbers to the right of the register 
name indicates the number that is used in the syntax of the instruction operands to access 
the register (for example, the number used to access the XER is SPR 1). 


All of the SPRs in the OEA can be accessed only by supervisor-level instructions; any 
attempt to access these SPRs with user-level instructions results in a supervisor-level 
exception. Some SPRs are implementation-specific. In some cases, not all of a register’s 
bits are implemented in hardware. 


If a PowerPC processor executes an mtspr/mfspr instruction with an undefined SPR 
encoding, it takes (depending on the implementation) an illegal instruction program 
exception, a privileged instruction program exception, or the results are boundedly 
undefined. See Section 6.4.7, “Program Exception (0x00700),” for more information. 


Note that the GPRs, LR, CTR, TBL, MSR, DAR, SDRI1, SRRO, SRRI, and 
SPRGO-SPRG3 are 64 bits wide on 64-bit implementations and 32 bits wide on 32-bit 
implementations. 
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( USER MODEL | 
UISA 
General-Purpose Registers 


GPRO (64/32) 
GPR1 (64/32) 















GPR31 (64/32) 





Floating-Point Registers 


FPRO (64) 
FPR1 (64) 


FPR31 (64) 


Condition Register ' 
CR (32) 





Floating-Point Status 
and Control Register ' 


FPSCR (32) 


XER Register ' 


XER (32) SPR 1 
Link Register 





LR (64/32) SPR8 


Count Register 


CTR (64/32) | SPRQ 


USER MODEL 
VEA 


Time Base Facility ' 
(For Reading) 

















TBR 2684 
TBR 269 


TBL (32) 








TBU (32) 








eA 


' These registers are 32-bit registers only. 





SUPERVISOR MODEL 
OEA 


Configuration Registers 


Machine State Register 
MSR (64/32) 


Memory Management Registers 


Instruction BAT Registers 
BATOU (64/32) | SPR 528 
(64/32) | SPR 529 
(64/32) | SPR 530 
BATIL (64/32) | SPR 531 
(64/32) | SPR 532 
(64/32) | SPR 533 
(64/32) | SPR 534 
BATSL (64/32) | SPR 535 








SDR1 


SDR1 (64/32) | SPR 25 
Address Space Register ° 


ASR (64) | SPR 280 





Data Address Register 
DAR (64/32) SPR 19 


SPRGs 
SPRGO (64/32) | SPR 272 
| SPRG1 (64/32) | SPR 273 
( ) 
( ) 





SPRG2 (64/32) | SPR 274 
SPRGS (64/32) | SPR 275 











Miscellaneous Registers 


Time Base Facility ' 
(For Writing) 





TBL (32) SPR 284 

TBU (32) SPR 285 
Decrementer ' 

DEC (32) SPR 22 





Processor Identification 
Register (Optional) 


? These registers are on 32-bit implementations only. 
3 These registers are on 64-bit implementations only. 
4 In 64-bit implementations, TBR268 is read as a 64-bit value 


_ 


Processor Version Register ' 


Data BAT Registers 
DBATOL (64/32) | SPR 537 
DBAT1U (64/32) | SPR 538 
DBATIL (64/32) | SPR 539 
DBAT2U ) | SPR 540 
DBAT2L (64/32) | SPR 541 


SPR 542 
SPR 543 
Segment Registers ':? 








Exception Handling Registers 





DSISR ' 


DSISR (32) SPR 18 


Save and Restore Registers 
SRRO (64/32) SPR 26 
SRR1 (64/32) | SPR 27 


Floating-Point Exception 
Cause Register (Optional) 


FPECR SPR 1022 


Data Address 
Breakpoint Register 
(Optional) 


DABR (64/32) | SPR 1013 


External Access Register 
(Optional) ' 


EAR (32) SPR 282 





Figure 2-11. OEA Programming Model—All Registers 
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A description of the PowerPC OEA supervisor-level registers follows: 
¢ Configuration registers 


— Machine state register (MSR). The MSR defines the state of the processor. The 
MSR can be modified by the Move to Machine State Register (mtmsr), System 
Call (sc), and Return from Interrupt (rfi) instructions. It can be read by the Move 
from Machine State Register (mfmsr) instruction. For more information, see 
Section 2.3.1, “Machine State Register (MSR).” 


— Processor version register (PVR). This register is a read-only register that 
identifies the version (model) and revision level of the PowerPC processor. For 
more information, see Section 2.3.2, “Processor Version Register (PVR).” 


¢ Memory management registers 


— Block-address translation (BAT) registers. The PowerPC OEA includes eight 
block-address translation registers (BATS), consisting of four pairs of instruction 
BATs (IBATOU-IBAT3U and IBATOL-IBAT3L) and four pairs of data BATs 
(DBATOU-DBAT3U and DBATOL-—DBAT3L). See Figure 2-11 for a list of the 
SPR numbers for the BAT registers. Refer to Section 2.3.3, “BAT Registers,” for 
more information. 


— SDRI. The SDR1 register specifies the page table base address used in virtual- 
to-physical address translation. For more information, see Section 2.3.4, 
“SDR1.” (Note that physical address is referred to as real address in the 
architecture specification.) 


— Segment registers (SR). The PowerPC OEA defines sixteen 32-bit segment 
registers (SRO-SR15). Note that the SRs are implemented on 32-bit 
implementations only. The fields in the segment register are interpreted 
differently depending on the value of bit 0. For more information, see 
Section 2.3.5, “Segment Registers.” 

¢ Exception handling registers 


— Data address register (DAR). After a DSI or an alignment exception, DAR is set 
to the effective address generated by the faulting instruction. For more 
information, see Section 2.3.6, “Data Address Register (DAR).” 


— SPRGO-SPRG3. The SPRGO-SPRG3 registers are provided for operating 
system use. For more information, see Section 2.3.7, “SPRGO-SPRG3.” 


— DSISR. The DSISR defines the cause of DSI and alignment exceptions. For more 
information, refer to Section 2.3.8, “DSISR.” 


Chapter 2. PowerPC Register Set 2-19 


— Machine status save/restore register 0 (SRRO). The SRRO register is used to save 
machine status on exceptions and to restore machine status when an rfi 
instruction is executed. For more information, see Section 2.3.9, “Machine 
Status Save/Restore Register 0 (SRRO).” 


— Machine status save/restore register 1 (SRR1). The SRR1 register is used to save 
machine status on exceptions and to restore machine status when an rfi 
instruction is executed. For more information, see Section 2.3.10, “Machine 
Status Save/Restore Register 1 (SRR1).” 


— Floating-point exception cause register (FPECR). This optional register is used 
to identify the cause of a floating-point exception. 


¢ Miscellaneous registers 


— Time base (TB). The TB is a 64-bit structure that maintains the time of day and 
operates interval timers. The TB consists of two 32-bit registers—time base 
upper (TBU) and time base lower (TBL). Note that the time base registers can be 
accessed by both user- and supervisor-level instructions. For more information, 
see Section 2.3.12, “Time Base Facility (TB)—-OEA” and Section 2.2, 
“PowerPC VEA Register Set—Time Base.” 


— Decrementer register (DEC). This register is a 32-bit decrementing counter that 
provides a mechanism for causing a decrementer exception after a 
programmable delay; the frequency is a subdivision of the processor clock. For 
more information, see Section 2.3.13, “Decrementer Register (DEC).” 


— External access register (EAR). This optional register is used in conjunction with 
the eciwx and ecowx instructions. Note that the EAR register and the eciwx and 
ecowx instructions are optional in the PowerPC architecture and may not be 
supported in all PowerPC processors that implement the OEA. For more 
information about the external control facility, see Section 4.3.4, “External 
Control Instructions.” 


— Data address breakpoint register (DABR). This optional register is used to 
control the data address breakpoint facility. Note that the DABR is optional in 
the PowerPC architecture and may not be supported in all PowerPC processors 
that implement the OFA. For more information about the data address 
breakpoint facility, see Section 6.4.3, “DSI Exception (0x00300).” 


— Processor identification register (PIR). This optional register is used to hold a 
value that distinguishes an individual processor in a multiprocessor environment. 


2.3.1 Machine State Register (MSR) 


The machine state register (MSR) is a 64-bit register on 64-bit implementations and a 32- 
bit register in 32-bit implementations (see Figure 2-12). The MSR defines the state of the 
processor. When an exception occurs, MSR bits, as described in Table 2-8, are altered as 
determined by the exception. The MSR can also be modified by the mtmsr, sc, and rfi 
instructions. It can be read by the mfmsr instruction. 
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[_] Reserved 








0000 0000 0000 0 POW] 0 | ILE |EE)PR|FP|ME|FEO|SE|BE|FE1| 0 | IP/IR|DR} 00 | RIjLE 
0 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2728 29 30 31 


Figure 2-12. Machine State Register (MSR) 


Table 2-8 shows the bit definitions for the MSR. 


Table 2-8. MSR Bit Settings 


Power management enable 

0 Power management disabled (normal operation mode) 

1 Power management enabled (reduced power mode) 

Note: Power management functions are implementation-dependent. If the function 
is not implemented, this bit is treated as reserved. 


Reserved 


Exception little-endian mode. When an exception occurs, this bit is copied into 
MSRI[LE] to select the endian mode for the context established by the exception. 


External interrupt enable 

0 While the bit is cleared, the processor delays recognition of external interrupts 
and decrementer exception conditions. 

1 The processor is enabled to take an external interrupt or the decrementer 
exception. 


Privilege level 
0 The processor can execute both user- and supervisor-level instructions. 
1 The processor can only execute user-level instructions. 
FP Floating-point available 
0 The processor prevents dispatch of floating-point instructions, including 


floating-point loads, stores, and moves. 
1 The processor can execute floating-point instructions. 


Machine check enable 
0 = Machine check exceptions are disabled. 
1 Machine check exceptions are enabled. 


Floating-point exception mode 0 (see Table 2-9). 


2 
2 
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Single-step trace enable (Optional) 

0 The processor executes instructions normally. 

1 The processor generates a single-step trace exception upon the successful 
execution of the next instruction. 

Note: If the function is not implemented, this bit is treated as reserved. 


o- 
13 
14 
15 
16 
17 
18 
19 

0 

1 

2 


Branch trace enable (Optional) 

0 ‘The processor executes branch instructions normally. 

1 The processor generates a branch trace exception after completing the 
execution of a branch instruction, regardless of whether the branch was taken. 

Note: If the function is not implemented, this bit is treated as reserved. 





Table 2-8. MSR Bit Settings (Continued) 


FE1 Floating-point exception mode 1 (See Table 2-9). 


Exception prefix. The setting of this bit specifies whether an exception vector offset 

is prepended with Fs or Os. In the following description, nnnnn is the offset of the 

exception vector. See Table 6-2. 

0 Exceptions are vectored to the physical address 0x000n_nnnn in 32-bit 
implementations and 0x0000_0000_000n_nnnn in 64-bit implementations. 

1 Exceptions are vectored to the physical address OxFFFn_nnnn in 32-bit 
implementations and 0x0000_0000_FFFn_nnnn in 64-bit implementations. 

In most systems, IP is set to 1 during system initialization, and then cleared to 0 

when initialization is complete. 


Instruction address translation 

0 Instruction address translation is disabled. 

1 Instruction address translation is enabled. 

For more information, see Chapter 7, “Memory Management.” 


Data address translation 

0 Data address translation is disabled. 

1 Data address translation is enabled. 

For more information, see Chapter 7, “Memory Management.” 


Reserved 


Recoverable exception (for system reset and machine check exceptions). 
0 Exception is not recoverable. 

1 Exception is recoverable. 

For more information, see Chapter 6, “Exceptions.” 


Little-endian mode enable 
0 The processor runs in big-endian mode. 
1 The processor runs in little-endian mode. 


rc 





The floating-point exception mode bits (FEO—FE1) are interpreted as shown in Table 2-9. 
Table 2-9. Floating-Point Exception Mode Bits 


Floating-point imprecise nonrecoverable 
Floating-point imprecise recoverable 
Floating-point precise mode 
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Table 2-10 indicates the initial state of the MSR at power up. 


Table 2-10. State of MSR at Power Up 


Fi 32-Bit 
Bit(s) name Default Value 


0-12 \— Unspecified 


Unspecified 


Unspecified! 


Unspecified! 





| Unspecified can be either 0 or 1 
? 1 is typical, but might be 0 


2.3.2 Processor Version Register (PVR) 


The processor version register (PVR) is a 32-bit, read-only register that contains a value 
identifying the specific version (model) and revision level of the PowerPC processor (see 
Figure 2-13). The contents of the PVR can be copied to a GPR by the mfspr instruction. 
Read access to the PVR is supervisor-level only; write access is not provided. 


Version Revision 
0 


15 16 31 


Figure 2-13. Processor Version Register (PVR) 
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The PVR consists of two 16-bit fields: 


¢ Version (bits 0-15)—A 16-bit number that uniquely identifies a particular processor 
version. This number can be used to determine the version of a processor; it may not 
distinguish between different end product models if more than one model uses the 
same processor. 


¢ Revision (bits 16—-31)—A 16-bit number that distinguishes between various releases 
of a particular version (that is, an engineering change level). The value of the 
revision portion of the PVR is implementation-specific. The processor revision level 
is changed for each revision of the device. 


2.3.3 BAT Registers 


The BAT registers (BATs) maintain the address translation information for eight blocks of 
memory. The BATs are maintained by the system software and are implemented as eight 
pairs of special-purpose registers (SPRs). Each block is defined by a pair of SPRs called 
upper and lower BAT registers. These BAT registers define the starting addresses and sizes 
of BAT areas. 


The PowerPC OEA defines the BAT registers as eight instruction block-address translation 
(IBAT) registers, consisting of four pairs of instruction BATs, or IBATs IBATOU-IBAT3U 
and IBATOL-IBAT3L) and eight data BATs, or DBATs, (DBATOU-DBAT3U and 
DBATOL—DBAT3L). See Figure 2-11 for a list of the SPR numbers for the BAT registers. 


Figure 2-14 and Figure 2-15 show the format of the upper and lower BAT registers for 
32-bit PowerPC processors. 





























Reserved 
0 14 15 1819 29 30 31 
Figure 2-14. Upper BAT Register 
[_] Reserved 
BRPN 0 0000 0000 0 WIMG* 0 PP 
0 14 15 24 25 28 29 30 31 


*W and G bits are not defined for IBAT registers. Attempting to write to these bits causes boundedly-undefined results. 
Figure 2-15. Lower BAT Register 


Table 2-13 describes the bits in the BAT registers. 
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Table 2-11. BAT Registers—Field and Bit Descriptions 


Upper/Lower She 
lil od 


Upper BAT BEPI Block effective page index. This field is compared with high-order bits of 

Register the logical address to determine if there is a hit in that BAT array entry. 
(Note that the architecture specification refers to logical address as 
effective address.) 


19-29 Block length. BL is a mask that encodes the size of the block. Values for 
this field are listed in Table 2-12. 


Supervisor mode valid bit. This bit interacts with MSR[PR] to determine if 
there is a match with the logical address. For more information, see 
Section 7.4.2, “Recognition of Addresses in BAT Arrays." 


User mode valid bit. This bit also interacts with MSR[PR] to determine if 
there is a match with the logical address. For more information, see 
Section 7.4.2, “Recognition of Addresses in BAT Arrays.” 


Lower BAT This field is used in conjunction with the BL field to generate high-order 
Register bits of the physical address of the block. 


25-28 WIMG Memory/cache access mode bits 
W_ Write-through 
| Caching-inhibited 
M_ Memory coherence 
G Guarded 
Attempting to write to the W and G bits in IBAT registers causes 
boundedly-undefined results. For detailed information about the WIMG 
bits, see Section 5.2.1, “Memory/Cache Access Attributes." 


a 
30-31 | Protection bits for block. This field determines the protection for the block 


as described in Section 7.4.4, “Block Memory Protection." 
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Table 2-12 lists the BAT area lengths encoded in BAT[BL]. 
Table 2-12. BAT Area Lengths 


Only the values shown in Table 2-12 are valid for the BL field. The rightmost bit of BL is 
aligned with bit 14 of the logical address. A logical address is determined to be within a 
BAT area if the logical address matches the value in the BEPI field. 





The boundary between the cleared bits and set bits (Os and 1s) in BL determines the bits of 
logical address that participate in the comparison with BEPI. Bits in the logical address 
corresponding to set bits in BL are cleared for this comparison. Bits in the logical address 
corresponding to set bits in the BL field, concatenated with the 17 bits of the logical address 
to the right (less significant bits) of BL, form the offset within the BAT area. This is 
described in detail in Chapter 7, “Memory Management.” 


The value loaded into BL determines both the length of the BAT area and the alignment of 
the area in both logical and physical address space. The values loaded into BEPI and BRPN 
must have at least as many low-order zeros as there are ones in BL. 


Use of BAT registers is described in Chapter 7, “Memory Management.” 
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2.3.4 SDR1 


The SDRI1 is a 64-bit register in 64-bit implementations and a 32-bit register in 32-bit 
implementations. The 32-bit implementation of SDR1 is shown in Figure 2-16. 


[_] Reserved 








HTABORG 0000 000 HTABMASK 

















0 15 16 22 23 31 
Figure 2-16. SDR1 


The bits of the 32-bit implementation of SDR1 are described in Table 2-13. 
Table 2-13. SDR1 Bit Settings 


a 
HTABORG The high-order 16 bits of the 32-bit physical address of the page table 


23-31 HTABMASK Mask for page table address 





In 32-bit implementations, the HTABORG field in SDR1 contains the high-order 16 bits of 
the 32-bit physical address of the page table. Therefore, the page table is constrained to lie 
ona 2!©_byte (64 Kbytes) boundary at a minimum. At least 10 bits from the hash function 
are used to index into the page table. The page table must consist of at least 64 Kbytes om 
PTEGs of 64 bytes each). 


The page table can be any size 2” where 16 <n < 25. As the table size is increased, more 
bits are used from the hash to index into the table and the value in HTABORG must have 
more of its low-order bits equal to 0. The HTABMASK field in SDR1 contains a mask value 
that determines how many bits from the hash are used in the page table index. This mask 
must be of the form 0b00...011...1; that is, a string of 0 bits followed by a string of bits. 
The 1 bits determine how many additional bits (at least 10) from the hash are used in the 
index; HTABORG must have this same number of low-order bits equal to 0. See 
Figure 7-23 for an example of the primary PTEG address generation in a 32-bit 
implementation. 


For example, suppose that the page table is 8,192 (2); 64-byte PTEGs, for a total size of 
2? bytes (512 Kbytes). Note that a 13-bit index is required. Ten bits are provided from the 
hash initially, so 3 additional bits form the hash must be selected. The value in 
HTABMASK must be 0x007 and the value in HTABORG must have its low-order 3 bits 
(bits 13-15 of SDR1) equal to 0. This means that the page table must begin on a 

23+ 10+6 919 _ 512 Kbytes boundary. 


For more information, refer to Chapter 7, “Memory Management.” 
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2.3.5 Segment Registers 


The segment registers contain the segment descriptors for 32-bit implementations. For 32- 
bit processors, the OEA defines a segment register file of sixteen 32-bit registers. Segment 
registers can be accessed by using the mtsr/mfsr and mtsrin/mfsrin instructions. The 
value of bit 0, the T bit, determines how the remaining register bits are interpreted. 
Figure 2-17 shows the format of a segment register when T = 0. 


[_] Reserved 








T |Ks|Kp| N 0000 VSID 


01234 78 31 
Figure 2-17. Segment Register Format (T = 0) 
Segment register bit settings when T = 0 are described in Table 2-14. 


Table 2-14. Segment Register Bit Settings (T = 0) 


[ie [vem [inn 
fo = [tT T = 0 selects this format 
Supervisor-state protection key 


pa [= [rene Sd 





Figure 2-18 shows the bit definition when T = 1. 





T | Ks| Kp BUID Controller-Specific Information 


0 1 2 3 11:12 31 


Figure 2-18. Segment Register Format (T = 1) 
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The bits in the segment register when T = 1 are described in Table 2-15. 


Table 2-15. Segment Register Bit Settings (T = 1) 


fe [me [eaten | 
fo Aes = == T =1 selects this format. 
Supervisor-state protection key 


User-state protection key 
BUID Bus unit ID 
12-31 CNTLR_SPEC | Device-specific data for I/O controller 





If an access is translated by the block address translation (BAT) mechanism, the BAT 
translation takes precedence and the results of translation using segment registers are not 
used. However, if an access is not translated by a BAT, and T = 0 in the selected segment 
register, the effective address is a reference to a memory-mapped segment. In this case, the 
52-bit virtual address (VA) is formed by concatenating the following: 


¢ The 24-bit VSID field from the segment register 
¢ The 16-bit page index, EA[4—19] 
¢ The 12-bit byte offset, EA[20—31] 


The VA is then translated to a physical address as described in Section 7.5, “Memory 
Segment Model.” 


If T = 1 in the selected segment register (and the access is not translated by a BAT), the 
effective address is a reference to a direct-store segment. No reference is made to the page 
tables. However, note that the direct-store facility is being phased out of the architecture and 
will not likely be supported in future devices. Thus, all new programs should write a value 
of zero to the T bit. For further discussion of address translation when T = 1, see 
Section 7.7, “Direct-Store Segment Address Translation.” 


2.3.6 Data Address Register (DAR) 


The DAR is a 64-bit register in 64-bit implementations and a 32-bit register in 32-bit 
implementations. The DAR is shown in Figure 2-19. 


DAR 





Figure 2-19. Data Address Register (DAR) 


The effective address generated by a memory access instruction is placed in the DAR if the 
access causes an exception (for example, an alignment exception). For information, see 
Chapter 6, “Exceptions.” 
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2.3.7 SPRGO-SPRG3 


SPRGO-SPRG3 are 64-bit or 32-bit registers, depending on the type of PowerPC processor. 
They are provided for general operating system use, such as performing a fast state save or 


for supporting multiprocessor implementations. The formats of SPRGO-SPRG3 are shown 
in Figure 2-20. 





SPRGO 
SPRG1 
SPRG2 
SPRG3 











Figure 2-20. SPRGO-SPRG3 
Table 2-16 provides a description of conventional uses of SPRGO through SPRG3. 
Table 2-16. Conventional Uses of SPRGO-SPRG3 


SPRGO_ | Software may load a unique physical address in this register to identify an area of memory 
reserved for use by the first-level exception handler. This area must be unique for each processor 
in the system. 


This register may be used as a scratch register by the first-level exception handler to save the 
content of a GPR. That GPR then can be loaded from SPRGO and used as a base register to 
save other GPRs to memory. 


SPRG2_ | This register may be used by the operating system as needed. 
SPRG3_ | This register may be used by the operating system as needed. 





2.3.8 DSISR 


The 32-bit DSISR, shown in Figure 2-21, identifies the cause of DSI and alignment 
exceptions. 


DSISR 








Figure 2-21. DSISR 


For information about bit settings, see Section 6.4.3, “DSI Exception (0x00300),” and 
Section 6.4.6, “Alignment Exception (0x00600).” 
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2.3.9 Machine Status Save/Restore Register 0 (SRRO) 


The SRRO is a 64-bit register in 64-bit implementations and a 32-bit register in 32-bit 
implementations. SRRO is used to save machine status on exceptions and restore machine 
status when an rfi instruction is executed. It also holds the EA for the instruction that 
follows the System Call (sc) instruction. The format of SRRO is shown in Figure 2-22. 


[_] Reserved 
SAAD [= 
0 29 30 31 


Figure 2-22. Machine Status Save/Restore Register 0 (SRRO) 


When an exception occurs, SRRO is set to point to an instruction such that all prior 
instructions have completed execution and no subsequent instruction has begun execution. 
When an rfi instruction is executed, the contents of SRRO are copied to the next instruction 
address (NIA)—the 64- or 32-bit address of the next instruction to be executed. The 
instruction addressed by SRRO may not have completed execution, depending on the 
exception type. SRRO addresses either the instruction causing the exception or the 
immediately following instruction. The instruction addressed can be determined from the 
exception type and status bits. 


Note that in some implementations, every instruction fetch performed while MSR[IR] = 1, 
and every instruction execution requiring address translation when MSR[DR] = 1, may 
modify SRRO. 


For information on how specific exceptions affect SRRO, refer to the descriptions of 
individual exceptions in Chapter 6, “Exceptions.” 


2.3.10 Machine Status Save/Restore Register 1 (SRR1) 


The SRRI is a 64-bit register in 64-bit implementations and a 32-bit register in 32-bit 
implementations. SRR1 is used to save machine status on exceptions and to restore 
machine status when an rfi instruction is executed. The format of SRR1 is shown in 
Figure 2-23. 


SRR1 


0 31 


Figure 2-23. Machine Status Save/Restore Register 1 (SRR1) 
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When an exception occurs, bits 1-4 and 10-15 of SRR1 are loaded with exception-specific 
information and bits 16-23, 25-27, and 30-31 of MSR are placed into the corresponding 
bit positions of SRR1.When rfi is executed, MSR[16—23, 25-27, 30-31] are loaded from 
SRR1[16—23, 25-27, 30-31]. 


The remaining bits of SRR1 are defined as reserved. An implementation may define one or 
more of these bits, and in this case, may also cause them to be saved from MSR on an 
exception and restored to MSR from SRR1 on an rfi. 


Note that, in some implementations, every instruction fetch when MSR[IR] = 1, and every 
instruction execution requiring address translation when MSR[DR] = 1, may modify SRR1. 


For information on how specific exceptions affect SRR1, refer to the individual exceptions 
in Chapter 6, “Exceptions.” 


2.3.11 Floating-Point Exception Cause Register (FPECR) 


The FPECR register may be used to identify the cause of a floating-point exception. Note 
that the FPECR is an optional register in the PowerPC architecture and may be 
implemented differently (or not at all) in the design of each processor. The user’s manual 
of a specific processor will describe the functionality of the FPECR, if it is implemented in 
that processor. 


2.3.12 Time Base Facility (TB)—OEA 


As described in Section 2.2, “PowerPC VEA Register Set—Time Base,” the time base (TB) 
provides a long-period counter driven by an implementation-dependent frequency. The 
VEA defines user-level read-only access to the TB. Writing to the TB is reserved for 
supervisor-level applications such as operating systems and boot-strap routines. The OEA 
defines supervisor-level, write access to the TB. 


The TB is a volatile resource and must be initialized during reset. Some implementations 
may initialize the TB with a known value; however, there is no guarantee of automatic 
initialization of the TB when the processor is reset. The TB runs continuously at start-up. 


For more information on the user-level aspects of the time base, refer to Section 2.2, 
“PowerPC VEA Register Set—Time Base.” 


2.3.12.1 Writing to the Time Base 


Note that writing to the TB is reserved for supervisor-level software. 


The simplified mnemonics, mttbl and mttbu, write the lower and upper halves of the TB, 
respectively. The simplified mnemonics listed above are for the mtspr instruction; see 
Appendix F, “Simplified Mnemonics,” for more information. The mtspr, mttbl, and mttbu 
instructions treat TBL and TBU as separate 32-bit registers; setting one leaves the other 
unchanged. It is not possible to write the entire 64-bit time base in a single instruction. 
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The instructions for writing the time base are not dependent on the implementation or 
mode. Thus, code written to set the TB on a 32-bit implementation will work correctly on 
a 64-bit implementation running in either 64- or 32-bit mode. 


The TB can be written by a sequence such as: 


lwz rx, upper #load 64-bit value for 
lwz ry, lower # TB into rx and ry 
Li rz,0 

mttbl rz #force TBL to 0 

mttbu rx #set TBU 

mttbl ry #set TBL 


Provided that no exceptions occur while the last three instructions are being executed, 
loading 0 into TBL prevents the possibility of a carry from TBL to TBU while the time base 
is being initialized. 


For information on reading the time base, refer to Section 2.2.1, “Reading the Time Base.” 


2.3.13 Decrementer Register (DEC) 


The decrementer register (DEC), shown in Figure 2-24, is a 32-bit decrementing counter 
that provides a mechanism for causing a decrementer exception after a programmable 
delay. The DEC frequency is based on the same implementation-dependent frequency that 
drives the time base. 





DEC 


Figure 2-24. Decrementer Register (DEC) 


2.3.13.1 Decrementer Operation 
The DEC counts down, causing an exception (unless masked by MSR[EE]) when it passes 
through zero. The DEC satisfies the following requirements: 


¢ The operation of the time base and the DEC are coherent (that is, the counters are 
driven by the same fundamental time base). 


¢ Loading a GPR from the DEC has no effect on the DEC. 
¢ Storing the contents of a GPR to the DEC replaces the value in the DEC with the 
value in the GPR. 


¢ Whenever bit 0 of the DEC changes from 0 to 1, a decrementer exception request is 
signaled. Multiple DEC exception requests may be received before the first 
exception occurs; however, any additional requests are canceled when the exception 
occurs for the first request. 


¢ Ifthe DEC is altered by software and the content of bit 0 is changed from 0 to 1, an 
exception request is signaled. 


Chapter 2. PowerPC Register Set 2-33 


2.3.13.2 Writing and Reading the DEC 

The content of the DEC can be read or written using the mfspr and mtspr instructions, both 
of which are supervisor-level when they refer to the DEC. Using a simplified mnemonic for 
the mtspr instruction, the DEC may be written from GPR rA with the following: 


mtdec rA 
Using a simplified mnemonic for the mfspr instruction, the DEC may be read into GPR rA 
with the following: 


mfdec rA 


2.3.14 Data Address Breakpoint Register (DABR) 


The optional data address breakpoint facility is controlled by an optional SPR, the DABR. 
The DABR is a 64-bit register in 64-bit implementations and a 32-bit register in 32-bit 
implementations. The data address breakpoint facility is optional to the PowerPC 
architecture. However, if the data address breakpoint facility is implemented, it is 
recommended, but not required, that it be implemented as described in this section. 


The data address breakpoint facility provides a means to detect accesses to a designated 
double word. The address comparison is done on an effective address, and it applies to data 
accesses only. It does not apply to instruction fetches. 


The DABR is shown in Figure 2-25. 





DAB BT/DW/DR 





0 28 29 30 31 


Figure 2-25. Data Address Breakpoint Register (DABR) 
Table 2-17 describes the fields in the DABR. 
Table 2-17. DABR—Bit Settings 


es 
Data address breakpoint 


Breakpoint translation enable 
Data write enable 
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A data address breakpoint match is detected for a load or store instruction if the three 
following conditions are met for any byte accessed: 


¢ EA[0-28] = DABR[DAB] 

¢ MSR[DR] = DABR[BT] 

¢ The instruction is a store and DABR[DW] = 1, or the instruction is a load and 

DABR[DR] = 1. 

Even if the above conditions are satisfied, it is undefined whether a match occurs in the 
following cases: 

¢ A store string instruction (stwex.) in which the store is not performed 

¢ A load or store string instruction (Iswx or stswx) with a zero length 


¢ A dcbz, dcbz, eciwx, or ecowx instruction. For the purpose of determining whether 
a match occurs, eciwx is treated as a load, and dcbz, dcba, and ecowx are treated as 
stores. 


The cache management instructions other than debz and dcba never cause a match. If debz 
or dcba causes a match, some or all of the target memory locations may have been updated. 


A match generates a DSI exception. Refer to Section 6.4.3, “DSI Exception (0x00300),” for 
more information on the data address breakpoint facility. 


2.3.15 External Access Register (EAR) 


The EAR is an optional 32-bit SPR that controls access to the external control facility and 
identifies the target device for external control operations. The external control facility 
provides a means for user-level instructions to communicate with special external devices. 
The EAR is shown in Figure 2-26. 


[_] Reserved 
000 0000 0000 0000 0000 0000 00 | RID | 
Oo 1 25 26 31 


Figure 2-26. External Access Register (EAR) 


The high-order bits of the resource ID (RID) field beyond the width of the RID supported 
by a particular implementation are treated as reserved bits. 


The EAR register is provided to support the External Control In Word Indexed (eciwx) and 
External Control Out Word Indexed (ecowx) instructions, which are described in Chapter 8, 
“Instruction Set.” Although access to the EAR is supervisor-level, the operating system can 
determine which tasks are allowed to issue external access instructions and when they are 
allowed to do so. The bit settings for the EAR are described in Table 2-18. Interpretation of 
the physical address transmitted by the eciwx and ecowx instructions and the 32-bit value 
transmitted by the ecowx instruction is not prescribed by the PowerPC OEA but is 
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determined by the target device. The data access of eciwx and ecowx is performed as 
though the memory access mode bits (WIMG) were 0101. 


For example, if the external control facility is used to support a graphics adapter, the ecowx 
instruction could be used to send the translated physical address of a buffer containing 
graphics data to the graphics device. The eciwx instruction could be used to load status 
information from the graphics adapter. 


Table 2-18. External Access Register (EAR) Bit Settings 


ee 


E Enable bit 
1 Enabled 
0 Disabled 
If this bit is set, the eciwx and ecowx instructions can perform the 
specified external operation. If the bit is cleared, an eciwx or ecowx 
instruction causes a DSI exception. 


a 


This register can also be accessed by using the mtspr and mfspr instructions. 
Synchronization requirements for the EAR are shown in Table 2-19 and Table 2-20. 





2.3.16 Processor Identification Register (PIR) 


The PIR register is used to differentiate between individual processors in a multiprocessor 
environment. Note that the PIR is an optional register in the PowerPC architecture and may 
be implemented differently (or not at all) in the design of each processor. The user’s manual 
of a specific processor will describe the functionality of the PIR, if it is implemented in that 
processor. 


2.3.17 Synchronization Requirements for Special Registers and for 
Lookaside Buffers 


Changing the value in certain system registers, and invalidating TLB entries, can cause 
alteration of the context in which data addresses and instruction addresses are interpreted, 
and in which instructions are executed. An instruction that alters the context in which data 
addresses or instruction addresses are interpreted, or in which instructions are executed, is 
called a context-altering instruction. The context synchronization required for context- 
altering instructions is shown in Table 2-19 for data access and Table 2-20 for instruction 
fetch and execution. 


A context-synchronizing exception (that is, any exception except nonrecoverable system 
reset or nonrecoverable machine check) can be used instead of a context-synchronizing 
instruction. In the tables, if no software synchronization is required before (after) a context- 
altering instruction, the synchronizing instruction before (after) the context-altering 
instruction should be interpreted as meaning the context-altering instruction itself. 
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A synchronizing instruction before the context-altering instruction ensures that all 
instructions up to and including that synchronizing instruction are fetched and executed in 
the context that existed before the alteration. A synchronizing instruction after the context- 
altering instruction ensures that all instructions after that synchronizing instruction are 
fetched and executed in the context established by the alteration. Instructions after the first 
synchronizing instruction, up to and including the second synchronizing instruction, may 
be fetched or executed in either context. 


If a sequence of instructions contains context-altering instructions and contains no 
instructions that are affected by any of the context alterations, no software synchronization 
is required within the sequence. 


Note that some instructions that occur naturally in the program, such as the rfi at the end of 
an exception handler, provide the required synchronization. 


No software synchronization is required before altering the MSR (except when altering the 
MSR[POW] or MSR[LE] bits; see Table 2-19 and Table 2-20), because mtmesr is execution 
synchronizing. No software synchronization is required before most of the other alterations 
shown in Table 2-20, because all instructions before the context-altering instruction are 
fetched and decoded before the context-altering instruction is executed (the processor must 
determine whether any of the preceding instructions are context synchronizing). 


Table 2-19 provides information on data access synchronization requirements. 


Table 2-19. Data Access Synchronization 








Chapter 2. PowerPC Register Set 2-37 


Table 2-19. Data Access Synchronization (Continued) 
Required Prior Required After 
Context-synchronizing instruction Context-synchronizing instruction or 
sync 


Context-synchronizing instruction Context-synchronizing instruction or 
sync 


Notes: 
Synchronization requirements for changing the power conserving mode are implementation-dependent. 





A context synchronizing instruction is required after modification of the MSR[ME] bit to ensure that the 
modification takes effect for subsequent machine check exceptions, which may not be recoverable and 
therefore may not be context synchronizing. 


Synchronization requirements for changing from one endian mode to the other are implementation-dependent. 
4 SDR1 must not be altered when MSR[DR] = 1 or MSRIIR] = 1; if it is, the results are undefined. 


A sync instruction is required before the mtspr instruction because SDR1 identifies the page table and thereby 
the location of the referenced and changed (R and C) bits. To ensure that R and C bits are updated in the 
correct page table, SDR1 must not be altered until all R and C bit updates due to instructions before the mtspr 
have completed. A sync instruction guarantees this synchronization of R and C bit updates, while neither a 
context synchronizing operation nor the instruction fetching mechanism does so. 


Synchronization requirements for changing the DABR are implementation-dependent. 
Multiprocessor systems have other requirements to synchronize TLB invalidate. 


For information on instruction access synchronization requirements, see Table 2-20. 


Table 2-20. Instruction Access Synchronization 


En 
) 
4 
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Table 2-20. Instruction Access Synchronization (Continued) 


Instruction/Event Required Prior Required After 


mimsr (LE) © 


mtsr [or mtsrin] 4 Context-synchronizing instruction 
mtspr (SDR1) © 7 Context-synchronizing instruction 


Notes: 
Synchronization requirements for changing the power conserving mode are implementation-dependent. 





1 


2 The effect of altering the EE bit is immediate as follows: 
+ If an mtmsr sets the EE bit to 0, neither an external interrupt nor a decrementer exception can occur after 
the instruction is executed. 
« If an mtmsr sets the EE bit to 1 when an external interrupt, decrementer exception, or higher priority 
exception exists, the corresponding exception occurs immediately after the mtmsr is executed, and 
before the next instruction is executed in the program that set MSR[EE]. 
A context synchronizing instruction is required after modification of the MSR[ME] bit to ensure that the 
modification takes effect for subsequent machine check exceptions, which may not be recoverable and therefore 
may not be context synchronizing. 


The alteration must not cause an implicit branch in physical address space. The physical address of the context- 
altering instruction and of each subsequent instruction, up to and including the next context synchronizing 
instruction, must be independent of whether the alteration has taken effect. 

Synchronization requirements for changing from one endian mode to the other are implementation-dependent. 

§ SDR1 must not be altered when MSR[DR] = 1 or MSR[IR] = 1; if it is, the results are undefined. 

A sync instruction is required before the mtspr instruction because SDR1 identifies the page table and thereby 
the location of the referenced and changed (R and C) bits. To ensure that R and C bits are updated in the correct 
page table, SDR1 must not be altered until all R and C bit updates due to instructions before the mtspr have 
completed. A sync instruction guarantees this synchronization of R and C bit updates, while neither a context 
synchronizing operation nor the instruction fetching mechanism does so. 

The elapsed time between the content of the decrementer becoming negative and the signaling of the 
decrementer exception is not defined. 


Multiprocessor systems have other requirements to synchronize TLB invalidate. 
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Chapter 3 
Operand Conventions 


This chapter describes the operand conventions as they are represented in two levels of the 
PowerPC architecture—user instruction set architecture (UISA) and virtual environment 
architecture (VEA). Detailed descriptions are provided of conventions used for storing 
values in registers and memory, accessing PowerPC registers, and representing data in these 
registers in both big- and little-endian modes. Additionally, the floating-point data formats 
and exception conditions are described. Refer to Appendix D, “Floating-Point Models,” for 
more information on the implementation of the IEEE floating-point execution models. 


3.1 Data Organization in Memory and Data Transfers 


In a PowerPC microprocessor-based system, bytes in memory are numbered consecutively 
starting with 0. Each number is the address of the corresponding byte. Memory operands 
may be bytes, half words, words, or double words, or, for the load and store multiple and 
the load and store string instructions, a sequence of bytes or words. The address of a 
memory operand is the address of its first byte (that is, of its lowest-numbered byte). 
Operand length is implicit for each instruction. 


The following sections describe the concepts of alignment and byte ordering of data, and 
their significance to the PowerPC architecture. 


3.1.1 Aligned and Misaligned Accesses 


The operand of a single-register memory access instruction has a natural alignment 
boundary equal to the operand length. In other words, the natural address of an operand is 
an integral multiple of the operand length. A memory operand is said to be aligned if it is 
aligned at its natural boundary; otherwise it is misaligned. Instructions are always four 
bytes long and word-aligned. 
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Operands for single-register memory access instructions have the characteristics shown in 
Table 3-1. (Although not permitted as memory operands, quad words are shown because 
quad-word alignment is desirable for certain memory operands.) 


Table 3-1. Memory Operand Alignment 


Lonath | Algnoa aaa(o-69) 


Note: An x in an address bit position indicates that the bit can be 0 or 1 
independent of the state of other bits in the address. 





The concept of alignment is also applied more generally to data in memory. For example, 
a 12-byte data item is said to be word-aligned if its address is a multiple of four. 


Some instructions require their memory operands to have certain alignment. In addition, 
alignment may affect performance. For single-register memory access instructions, the best 
performance is obtained when memory operands are aligned. 


3.1.2 Byte Ordering 


If individual data items were indivisible, the concept of byte ordering would be 
unnecessary. The order of bits or groups of bits within the smallest addressable unit of 
memory is irrelevant, because nothing can be observed about such order. Order matters 
only when scalars, which the processor and programmer regard as indivisible quantities, 
can be made up of more than one addressable unit of memory. 


For PowerPC processors, the smallest addressable memory unit is the byte (8 bits), and 
scalars are composed of one or more sequential bytes. When a 32-bit scalar is moved from 
a register to memory, it occupies four consecutive bytes in memory, and a decision must be 
made regarding the order of these bytes in these four addresses. 


Although the choice of byte ordering is arbitrary, only two orderings are practical—big- 
endian and little-endian. The PowerPC architecture supports both big- and little-endian 
byte ordering. The default byte ordering is big-endian. 


3.1.2.1 Big-Endian Byte Ordering 


For big-endian scalars, the most-significant byte (MSB) is stored at the lowest (or starting) 
address while the least-significant byte (LSB) is stored at the highest (or ending) address. 
This is called big-endian because the big end of the scalar comes first in memory. 
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3.1.2.2 Little-Endian Byte Ordering 


For little-endian scalars, the least-significant byte is stored at the lowest (or starting) 
address while the most-significant byte is stored at the highest (or ending) address. This is 
called little-endian because the little end of the scalar comes first in memory. 


3.1.3 Structure Mapping Examples 


Figure 3-1 shows a C programming example that contains an assortment of scalars and one 
array of characters (a string). The value presumed to be in each structure element is shown 
in hexadecimal in the comments (except for the character array, which is represented by a 
sequence of characters, each enclosed in single quote marks). 


struct { 
int a; 7* Oxilte 1314 word cand 
double b; /* 0x2122_ 2324 2526 2728 double word */ 
char * -@: /* 0*3132.3334 word a 3 
char dl7]; /* 'L','M', 'N','0','B','Q','R' array of bytes */ 
short e; f* O25152 half word ay 
int is /* 0x6162_6364 word ay 


Figure 3-1. C Program Example—Data Structure S 


The data structure S is used throughout this section to demonstrate how the bytes that 
comprise each element (a, b, c, d, e, and f) are mapped into memory. 
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3.1.3.1 Big-Endian Mapping 

The big-endian mapping of the structure, S, is shown in Figure 3-2. Addresses are shown in 
hexadecimal below each byte. The content of each byte, as shown in the preceding C 
programming example, is shown in hexadecimal and, for the character array, as characters 
enclosed in single quote marks. Note that the most-significant byte of each scalar is at the 
lowest address. 



























































Contents 11 12 13 14 (x) (x) (x) (x) 
Address 00 01 02 03 04 05 06 07 
Contents 21 22 23 24 25 26 27 28 
Address 08 09 0A 0B 0c 0D OE OF 
Contents 31 32 33 34 L ‘MW’ ‘N’ ‘Oo 
Address 10 11 Ae 13 14 15 16 17 
Contents ‘P’ Q ‘R’ (x) 51 52 (x) (x) 
Address 18 19 1A 1B 1c 1D 1E 1F 
Contents 61 62 63 64 (x) (x) (x) (x) 
Address 20 21 22 23 24 25 26 27 


Figure 3-2. Big-Endian Mapping of Structure S 


The structure mapping introduces padding (skipped bytes indicated by (x) in Figure 3-18) 
in the map in order to align the scalars on their proper boundaries—four bytes between 
elements a and b, one byte between elements d and e, and two bytes between elements e 
and f. Note that the padding is dependent on the compiler; it is not a function of the 
architecture. 
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3.1.3.2 Little-Endian Mapping 


Figure 3-3 shows the structure, S, using little-endian mapping. Note that the least- 
significant byte of each scalar is at the lowest address. 






















































































Contents 14 13 12 11 (x) (x) (x) (x) 
Address 00 01 02 03 04 05 06 07 
Contents 28 27 26 25 24 23 22 21 
Address 08 09 0A 0B 0c oD OE OF 
Contents 34 33 32 31 ‘E ‘MW’ ‘N’ 

Address 10 11 12 13 14 15 16 17 
Contents ‘P’ Q ‘R’ (x) 52 51 (x) (x) 
Address 18 19 1A 1B 1c 1D 1E 1F 
Contents 64 63 62 61 (x) (x) (x) (x) 
Address 20 21 22 23 24 25 26 27 


Figure 3-3. Little-Endian Mapping of Structure S 


Figure 3-3 shows the sequence of double words laid out with addresses increasing from left 
to right. Programmers familiar with little-endian byte ordering may be more accustomed to 
viewing double words laid out with addresses increasing from right to left, as shown in 
Figure 3-4. This allows the little-endian programmer to view each scalar in its natural byte 
order of MSB to LSB. However, to demonstrate how the PowerPC architecture provides 
both big- and little-endian support, this section uses the convention of showing addresses 
increasing from left to right, as in Figure 3-3. 
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Contents (x) (x) (x) (x) 11 12 13 14 
Address 07 06 05 04 03 02 01 00 
Contents 21 22 23 24 25 26 27 28 
Address OF OE oD 0c 0B 0A 09 08 
Contents ‘Oo ‘N’ ‘MW’ L 31 32 uo 34 
Address 17 16 15 14 13 12 11 10 
Contents (x) (x) 51 52 (x) ‘R’ Q’ ‘iP’ 
Address IP 1E 1D 1C 1B 1A 19 18 
Contents (x) (x) (x) (x) 61 62 63 64 
Address 27 26 25 24 23 22 21 20 
































Figure 3-4. Little-Endian Mapping of Structure S —Alternate View 


3.1.4 PowerPC Byte Ordering 


The PowerPC architecture supports both big- and little-endian byte ordering. The default 
byte ordering is big-endian. However, the code sequence used to switch from big- to little- 
endian mode may differ among processors. 


The PowerPC architecture defines two bits in the MSR for specifying byte ordering—LE 
(little-endian mode) and ILE (exception little-endian mode). The LE bit specifies the endian 
mode in which the processor is currently operating and ILE specifies the mode to be used 
when an exception handler is invoked. That is, when an exception occurs, the ILE bit (as 
set for the interrupted process) is copied into MSR[LE] to select the endian mode for the 
context established by the exception. For both bits, a value of 0 specifies big-endian mode 
and a value of | specifies little-endian mode. 


The PowerPC architecture also provides load and store instructions that reverse byte 
ordering. These instructions have the effect of loading and storing data in the endian mode 
opposite from that which the processor is operating. See Section 4.2.3.4, “Integer Load and 
Store with Byte-Reverse Instructions,” for more information on these instructions. 


3.1.4.1 Aligned Scalars in Little-Endian Mode 


Chapter 4, “Addressing Modes and Instruction Set Summary,” describes the effective 
address calculation for the load and store instructions. For processors in little-endian mode, 
the effective address is modified before being used to access memory. The three low-order 
address bits of the effective address are exclusive-ORed (XOR) with a three-bit value that 
depends on the length of the operand (1, 2, 4, or 8 bytes), as shown in Table 3-2. This 
address modification is called ‘munging’. Note that although the process is described in the 
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architecture, the actual term ‘munging’ is not defined or used in the specification. However, 
the term is commonly used to describe the effective address modifications necessary for 
converting big-endian addressed data to little-endian addressed data. 


Table 3-2. EA Modifications 


Data Width (Bytes) EA Modification 
ee es 





The munged physical address is passed to the cache or to main memory, and the specified 
width of the data is transferred (in big-endian order—that is, MSB at the lowest address, 
LSB at the highest address) between a GPR or FPR and the addressed memory locations 
(as modified). 


Munging makes it appear to the processor that individual aligned scalars are stored as little- 
endian, when in fact they are stored in big-endian order, but at different byte addresses 
within double words. Only the address is modified, not the byte order. 


Taking into account the preceding description of munging, in little-endian mode, structure 
S is placed in memory as shown in Figure 3-5. 



























































Contents (x) (x) (x) (x) 11 12 13 14 
Address 00 01 02 03 04 05 06 07 
Contents 21 22 23 24 25 26 27 28 
Address 08 09 0A 0B 0c 0D OE OF 
Contents ‘Oo 'N ‘MW’ L 31 32 33 34 
Address 10 11 12 13 14 15 16 17 
Contents (x) (x) 51 52 (x) ‘R’ Q 

Address 18 19 1A 1B 1c 1D 1E 1F 
Contents (x) (x) (x) (x) 61 62 63 64 
Address 20 21 22 23 24 25 26 27 


Figure 3-5. Munged Little-Endian Structure S as Seen by the Memory Subsystem 
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Note that the mapping shown in Figure 3-5 is not a true little-endian mapping of the 
structure S. However, because the processor munges the address when accessing memory, 
the physical structure S shown in Figure 3-5 appears to the processor as the structure $ 
shown in Figure 3-6. 






















































































Contents 14 13 12 11 

Address 00 01 02 03 04 05 06 07 
Contents 28 27 26 25 24 23 22 21 
Address 08 09 0A 0B 0c 0D OE OF 
Contents 34 33 32 31 ‘Ee ‘MW’ ‘N’ Oo 
Address 10 11 12 13 14 15 16 17 
Contents P cy ‘R’ 52 51 

Address 18 19 1A 1B 1c 1D 1E 1F 
Contents 64 63 62 61 

Address 20 21 22 23 24 25 26 27 


Figure 3-6. Munged Little-Endian Structure S as Seen by Processor 


Note that as seen by the program executing in the processor, the mapping for the structure 
S (Figure 3-6) is identical to the little-endian mapping shown in Figure 3-3. However, from 
outside of the processor, the addresses of the bytes making up the structure S are as shown 
in Figure 3-5. These addresses match neither the big-endian mapping of Figure 3-2 nor the 
true little-endian mapping of Figure 3-3. This must be taken into account when performing 
I/O operations in little-endian mode; this is discussed in Section 3.1.4.5, “PowerPC 
Input/Output Data Transfer Addressing in Little-Endian Mode.” 
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3.1.4.2 Misaligned Scalars in Little-Endian Mode 

Performing an XOR operation on the low-order bits of the address works only if the scalar 
is aligned on a boundary equal to a multiple of its length. Figure 3-7 shows a true little- 
endian mapping of the four-byte word 0x1112_1314, stored at address 05. 









































Contents 14 Td 12 
Address 00 01 02 03 04 05 06 07 
Contents 11 

Address 08 09 0A 0B 0c OD OE OF 


Figure 3-7. True Little-Endian Mapping, Word Stored at Address 05 


For the true little-endian example in Figure 3-7, the least-significant byte (0x14) is stored 
at address 0x05, the next byte (0x13) is stored at address 0x06, the third byte (0x12) is 
stored at address 0x07, and the most-significant byte (0x11) is stored at address 0x08. 


When a PowerPC processor, in little-endian mode, issues a single-register load or store 
instruction with a misaligned effective address, it may take an alignment exception. In this 
case, a single-register load or store instruction means any of the integer load/store, 
load/store with byte-reverse, memory synchronization (excluding sync), or floating-point 
load/store (including stfiwx) instructions. PowerPC processors in little-endian mode are not 
required to invoke an alignment exception when such a misaligned access is attempted. The 
processor may handle some or all such accesses without taking an alignment exception. 


The PowerPC architecture requires that half words, words, and double words be placed in 
memory such that the little-endian address of the lowest-order byte is the effective address 
computed by the load or store instruction; the little-endian address of the next-lowest-order 
byte is one greater, and so on. However, because PowerPC processors in little-endian mode 
munge the effective address, the order of the bytes of a misaligned scalar must be as if they 
were accessed one at a time. 


Using the same example as shown in Figure 3-7, when the least-significant byte (0x14) is 
stored to address 0x05, the address is XORed with 0b111 to become 0x02. When the next 
byte (0x13) is stored to address 0x06, the address is XORed with 0b111 to become 0x01. 
When the third byte (0x12) is stored to address 0x07, the address is XORed with 0b111 to 
become 0x00. Finally, when the most-significant byte (0x11) is stored to address 0x08, the 
address is XORed with 0b111 to become OxOF. Figure 3-8 shows the misaligned word, 
stored by a little-endian program, as seen by the memory subsystem. 
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Contents 12 13 14 

Address 00 01 02 03 04 05 06 07 
Contents 11 
Address 08 09 0A 0B 0c 0D OE OF 


Figure 3-8. Word Stored at Little-Endian Address 05 as Seen by the Memory 
Subsystem 


Note that the misaligned word in this example spans two double words. The two parts of 
the misaligned word are not contiguous as seen by the memory system. An implementation 
may support some but not all misaligned little-endian accesses. For example, a misaligned 
little-endian access that is contained within a double word may be supported, while one that 
spans double words may cause an alignment exception. 


3.1.4.3 Nonscalars 
The PowerPC architecture has two types of instructions that handle nonscalars (multiple 
instances of scalars): 


¢ Load and store multiple instructions 
¢ Load and store string instructions 


Because these instructions typically operate on more than one word-length scalar, munging 
cannot be used. These types of instructions cause alignment exception conditions when the 
processor is executing in little-endian mode. Although string accesses are not supported, 
they are inherently byte-based operations, and can be broken into a series of word-aligned 
accesses. 


3.1.4.4 PowerPC Instruction Addressing in Little-Endian Mode 

Each PowerPC instruction occupies an aligned word of memory. PowerPC processors fetch 
and execute instructions as if the current instruction address is incremented by four for each 
sequential instruction. When operating in little-endian mode, the instruction address is 
munged as described in Section 3.1.4.1, “Aligned Scalars in Little-Endian Mode,” for 
fetching word-length scalars; that is, the instruction address is XORed with 0b100. A 
program is thus an array of little-endian words with each word fetched and executed in 
order (not including branches). 


3-10 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


All instruction addresses visible to an executing program are the effective addresses that are 
computed by that program, or, in the case of the exception handlers, effective addresses that 
were or could have been computed by the interrupted program. These effective addresses 
are independent of the endian mode. Examples for little-endian mode include the 
following: 


e An instruction address placed in the link register by branch and link operation, or an 
instruction address saved in an SPR when an exception is taken, is the address that 
a program executing in little-endian mode would use to access the instruction as a 
word of data using a load instruction. 


¢ An offset in a relative branch instruction reflects the difference between the 
addresses of the branch and target instructions, where the addresses used are those 
that a program executing in little-endian mode would use to access the instructions 
as data words using a load instruction. 


e A target address in an absolute branch instruction is the address that a program 
executing in little-endian mode would use to access the target instruction as a word 
of data using a load instruction. 


¢ The memory locations that contain the first set of instructions executed by each kind 
of exception handler must be set in a manner consistent with the endian mode in 
which the exception handler is invoked. Thus, if the exception handler is to be 
invoked in little-endian mode, the first set of instructions comprising each kind of 
exception handler must appear in memory with the instructions within each double 
word reversed from the order in which they are to be executed. 


3.1.4.5 PowerPC Input/Output Data Transfer Addressing in Little- 
Endian Mode 


For a PowerPC system running in big-endian mode, both the processor and the memory 
subsystem recognize the same byte as byte 0. However, this is not true for a PowerPC 
system running in little-endian mode because of the munged address bits when the 
processor accesses memory. 


For I/O transfers in little-endian mode to transfer bytes properly, they must be performed 
as if the bytes transferred were accessed one at a time, using the little-endian address 
modification appropriate for the single-byte transfers (that is, the lowest order address bits 
must be XORed with 0b111). This does not mean that I/O operations in little-endian 
PowerPC systems must be performed using only one-byte-wide transfers. Data transfers 
can be as wide as desired, but the order of the bytes within double words must be as if they 
were fetched or stored one at a time. That is, for a true little-endian I/O device, the system 
must provide a mechanism to munge and unmunge the addresses and reverse the bytes 
within a double word (MSB to LSB). 
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In earlier processors, I/O operations can also be performed with certain devices by storing 
to or loading from addresses that are associated with the devices (this is referred to as 
direct-store interface operations). However, the direct-store facility is being phased out of 
the architecture and will not likely be supported in future devices. Care must be taken with 
such operations when defining the addresses to be used because these addresses are 
subjected to munging as described in Section 3.1.4.1, “Aligned Scalars in Little-Endian 
Mode.” A load or store that maps to a control register on an external device may require the 
bytes of the value transferred to be reversed. If this reversal is required, the load and store 
with byte-reverse instructions may be used. See Section 4.2.3.4, “Integer Load and Store 
with Byte-Reverse Instructions,” for more information on these instructions. 


3.2 Effect of Operand Placement on 
Performance—VEA 

The PowerPC VEA states that the placement (location and alignment) of operands in 

memory affects the relative performance of memory accesses. The best performance is 


guaranteed if memory operands are aligned on natural boundaries. For more information 
on memory access ordering and atomicity, refer to Section 5.1, “The Virtual Environment.” 


3.2.1 Summary of Performance Effects 


To obtain the best performance across the widest range of PowerPC processor 
implementations, the programmer should assume the performance model described in 
Table 3-3 and Table 3-4 with respect to the placement of memory operands. 


The performance of accesses varies depending on the following: 


¢ Operand size 

¢ Operand alignment 

¢ Endian mode (big-endian or little-endian) 
¢ Crossing no boundary 

* Crossing a cache block boundary 

¢ Crossing a page boundary 

¢ Crossing a BAT boundary 

¢ Crossing a segment boundary 
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Table 3-3 applies when the processor is in big-endian mode. 


Table 3-3. Performance Effects of Memory Operand Placement, Big-Endian Mode 


| Operant Boundary Crossing 


Byte 
a Cache Block Page BAT/Segment 
8 ee.) 
Good Poor Poor 
<4 Poor Poor Poor 
4 byte S Optimal = a es 
Good Good Poor Poor 
2 byte Optimal 
Good fot — — 


ra [7 one — [= 
a a a 


Floating Point | mene Cache Block — 
8 byte 
Good pest Poor 
Poor Poor Poor 
4 byte Optimal 
Poor Poor oot Poor 


Note: ' Note that crossing a page boundary where the memory/cache access attributes of the two 
pages differ is equivalent to crossing a segment boundary, and thus has poor performance. 





Table 3-4 applies when the processor is in little-endian mode. 
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Table 3-4. Performance Effects of Memory Operand Placement, Litile-Endian Mode 


| Operant Boundary Crossing 


Byte 
a Cache Block Page BAT/Segment 


8 a ee Optimal 
Poor Poor a Bode 
4 byte Optimal 
Poor Poo! Péor Bont 
2 byte Optimal _ _ 
- Poor — Poor Poor 
Floating Point | None | Cache Block BAT/Segment 
8 byte pier _ _ 
Poor Poot Poor Poor 
4 byte Optimal _ _ 
oi Poor Poor Poor Poor 


The load/store multiple and the load/store string instructions are supported only in big- 
endian mode. The load/store multiple instructions are defined by the PowerPC architecture 
to operate only on aligned operands. The load/store string instructions have no alignment 
requirements. 








3.2.2 Instruction Restart 


If a memory access crosses a page, BAT, or segment boundary, a number of conditions 
could abort the execution of the instruction after part of the access has been performed. For 
example, this may occur when a program attempts to access a page it has not previously 
accessed or when the processor must check for a possible change in the memory/cache 
access attributes when an access crosses a page boundary. When this occurs, the processor 
or the operating system may restart the instruction. If the instruction is restarted, some bytes 
at that location may be loaded from or stored to the target location a second time. 


The following rules apply to memory accesses with regard to restarting the instruction: 


e Aligned accesses—A single-register instruction that accesses an aligned operand is 
never restarted (that is, it is not partially executed). 


¢ Misaligned accesses—A single-register instruction that accesses a misaligned 
operand may be restarted if the access crosses a page, BAT, or segment boundary, or 
if the processor is in little-endian mode. 


¢ Load/store multiple, load/store string instructions—These instructions may be 
restarted if, in accessing the locations specified by the instruction, a page, BAT, or 
segment boundary is crossed. 
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The programmer should assume that any misaligned access in a segment might be restarted. 
When the processor is in big-endian mode, software can ensure that misaligned accesses 
are not restarted by placing the misaligned data in BAT areas, as BAT areas have no internal 
protection boundaries. Refer to Section 7.4, “Block Address Translation,’ for more 
information on BAT areas. 


3.3 Floating-Point Execution Models—UISA 


There are two kinds of floating-point instructions defined for the PowerPC architecture: 
computational and noncomputational. The computational instructions consist of those 
operations defined by the IEEE-754 standard for 64- and 32-bit arithmetic (those that 
perform addition, subtraction, multiplication, division, extracting the square root, rounding 
conversion, comparison, and combinations of these) and the multiply-add and reciprocal 
estimate instructions defined by the architecture. The noncomputational floating-point 
instructions consist of the floating-point load, store, and move instructions. While both the 
computational and noncomputational instructions are considered to be floating-point 
instructions governed by the MSR[FP] bit (that allows floating-point instructions to be 
executed), only the computational instructions are considered floating-point operations 
throughout this chapter. 


The IEEE standard requires that single-precision arithmetic be provided for single- 
precision operands. The standard permits double-precision arithmetic instructions to have 
either (or both) single-precision or double-precision operands, but states that single- 
precision arithmetic instructions should not accept double-precision operands. The 
guidelines are as follows: 


¢ Double-precision arithmetic instructions may have single-precision operands but 
always produce double-precision results. 

¢ Single-precision arithmetic instructions require all operands to be single-precision 
and always produce single-precision results. 


For arithmetic instructions, conversion from double- to single-precision must be done 
explicitly by software, while conversion from single- to double-precision is done implicitly 
by the processor. 


All PowerPC implementations provide the equivalent of the following execution models to 
ensure that identical results are obtained. The definition of the arithmetic instructions for 
infinities, denormalized numbers, and NaNs follow conventions described in the following 
sections. Appendix D, “Floating-Point Models,” has additional detailed information on the 
execution models for IEEE operations as well as the other floating-point instructions. 


Although the double-precision format specifies an 11-bit exponent, exponent arithmetic 
uses two additional bit positions to avoid potential transient overflow conditions. An extra 
bit is required when denormalized double-precision numbers are prenormalized. A second 
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bit is required to permit computation of the adjusted exponent value in the following 
examples when the corresponding exception enable bit is 1 (exceptions are referred to as 
interrupts in the architecture specification): 


¢ Underflow during multiplication using a denormalized operand 
¢ Overflow during division using a denormalized divisor 


3.3.1 Floating-Point Data Format 

The PowerPC UISA defines the representation of a floating-point value in two different 
binary, fixed-length formats. The format is a 32-bit format for a single-precision floating- 
point value or a 64-bit format for a double-precision floating-point value. The single- 
precision format may be used for data in memory. The double-precision format can be used 
for data in memory or in floating-point registers (FPRs). 


The lengths of the exponent and the fraction fields differ between these two formats. The 
layout of the single-precision format is shown in Figure 3-9; the layout of the double- 
precision format is shown in Figure 3-10. 





EXP FRACTION 
0 1 89 31 


Figure 3-9. Floating-Point Single-Precision Format 


EXP FRACTION 
01 1112 63 





Figure 3-10. Floating-Point Double-Precision Format 


Values in floating-point format consist of three fields: 
¢ S (sign bit) 
¢« EXP (exponent + bias) 
¢ FRACTION (fraction) 


If only a portion of a floating-point data item in memory is accessed, as with a load or store 
instruction for a byte or half word (or word in the case of floating-point double-precision 
format), the value affected depends on whether the PowerPC system is using big- or little- 
endian byte ordering, which is described in Section 3.1.2, “Byte Ordering.” Big-endian 
mode is the default. 


3-16 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


For numeric values, the significand consists of a leading implied bit concatenated on the 
right with the FRACTION. This leading implied bit is a 1 for normalized numbers and a 0 
for denormalized numbers and is the first bit to the left of the binary point. Values 
representable within the two floating-point formats can be specified by the parameters 
listed in Table 3-5. 


Table 3-5. IEEE Floating-Point Fields 


Single-Precision Double-Precision 


| Exponent bias | | Exponent bias | ps7 27 +1 f+1023 
Maximum exponent pr 27 ps 023 
(unbiased) 
Minimum exponent -126 -1022 
(unbiased) 


The true value of the exponent can be determined by subtracting 127 for single-precision 
numbers and 1023 for double-precision numbers. This is shown in Table 3-6. Note that two 
exponent values are reserved to represent special-case values. Setting all bits indicates that 
the value is an infinity or NaN and clearing all bits indicates that the number is either zero 
or denormalized. 





Table 3-6. Biased Exponent Format 


Biased Exponent Single-Precision Double-Precision 
(Binary) (Unbiased) (Unbiased) 


Reserved for infinities and NaNs 
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Table 3-6. Biased Exponent Format (Continued) 


Biased Exponent Single-Precision Double-Precision 
(Binary) (Unbiased) (Unbiased) 


Reserved for zeros and denormalized numbers 





3.3.1.1 Value Representation 

The PowerPC UISA defines numerical and nonnumerical values representable within 
single- and double-precision formats. The numerical values are approximations to the real 
numbers and include the normalized numbers, denormalized numbers, and zero values. The 
nonnumerical values representable are the positive and negative infinities and the NaNs. 
The positive and negative infinities are adjoined to the real numbers but are not numbers 
themselves, and the standard rules of arithmetic do not hold when they appear in an 
operation. They are related to the real numbers by order alone. It is possible, however, to 
define restricted operations among numbers and infinities as defined below. The relative 
location on the real number line for each of the defined numerical entities is shown in 
Figure 3-11. Tiny values include denormalized numbers and all numbers that are too small 
to be represented for a particular precision format; they do not include +0. 





Tiny Tiny 


—co 


—NORM | -DeNoRw +DENORM | +NORM | +00 


| | | | 7 
Unrepresentable, small numbers 


Figure 3-11. Approximation to Real Numbers 

















The positive and negative NaNs are encodings that convey diagnostic information such as 
the representation of uninitialized variables and are not related to the numbers, +o, or each 
other by order or value. 


Table 3-7 describes each of the floating-point formats. 


Table 3-7. Recognized Floating-Point Numbers 


ee 
ee 


a 
Ec 
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Table 3-7. Recognized Floating-Point Numbers (Continued) 


od 


a 
ee 


The following sections describe floating-point values defined in the architecture. 





Lae 


3.3.1.2 Binary Floating-Point Numbers 

Binary floating-point numbers are machine-representable values used to approximate real 
numbers. Three categories of numbers are supported—normalized numbers, denormalized 
numbers, and zero values. 


3.3.1.3 Normalized Numbers (tNORM) 
The values for normalized numbers have a biased exponent value in the range: 
¢ 1-254 in single-precision format 
¢ 11-2046 in double-precision format 
The implied unit bit is one. Normalized numbers are interpreted as follows: 
NORM = (-1)§ x 27 x (1.fraction) 


The variable (s) is the sign, (E) is the unbiased exponent, and (1.fraction) is the significand 
composed of a leading unit bit (implied bit) and a fractional part. The format for normalized 
numbers is shown in Figure 3-12. 


fel “as E BIASED) Senne FRACTION = ANY BIT PATTERN 


SIGN BIT, 0 OR 1 





Figure 3-12. Format for Normalized Numbers 
The ranges covered by the magnitude (M) of a normalized floating-point number are 
approximated in the following decimal representation: 
Single-precision format: 
1.2x1078 < m < 3.4x10%8 
Double-precision format: 


2.2x1078 < uw < 1.8x10°%8 
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3.3.1.4 Zero Values (+0) 


Zero values have a biased exponent value of zero and fraction of zero. This is shown in 
Figure 3-13. Zeros can have a positive or negative sign. The sign of zero is ignored by 
comparison operations (that is, comparison regards +0 as equal to —0). Arithmetic with zero 
results is always exact and does not signal any exception, except when an exception occurs 
due to the invalid operations as described in Section 3.3.6.1.1, “Invalid Operation 
Exception Condition.” Rounding a zero result only affects the sign (+0). 





(BIASED) 


he SIGN BIT, 0 OR 1 


Figure 3-13. Format for Zero Numbers 


3.3.1.5 Denormalized Numbers (tDENORM) 


Denormalized numbers have a biased exponent value of zero and a nonzero fraction. The 
format for denormalized numbers is shown in Figure 3-14. 


| | EN Oe FRACTION = 0 














EXPONENT = 0 FRACTION = ANY NONZERO 
(BIASED) BIT PATTERN 


as SIGN BIT, 0 OR 1 


Figure 3-14. Format for Denormalized Numbers 











Denormalized numbers are nonzero numbers smaller in magnitude than the normalized 
numbers. They are values in which the implied unit bit is zero. Denormalized numbers are 
interpreted as follows: 


DENORM = (-1)8 x 2F™™ x (0. fraction) 


The value Emin is the minimum unbiased exponent value for a normalized number (—126 
for single-precision, —1022 for double-precision). 
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3.3.1.6 Infinities (+c) 

These are values that have the maximum biased exponent value of 255 in the single- 
precision format, 2047 in the double-precision format, and a zero fraction value. They are 
used to approximate values greater in magnitude than the maximum normalized value. 
Infinity arithmetic is defined as the limiting case of real arithmetic, with restricted 
operations defined among numbers and infinities. Infinities and the real numbers can be 
related by ordering in the affine sense: 


—oo < every finite number < +00 


The format for infinities is shown in Figure 3-15. 


EXPONENT = MAXIMUM 
hal (BIASED) FRACTION = 0 








SIGN BIT, 0 OR 1 
Figure 3-15. Format for Positive and Negative Infinities 


Arithmetic using infinite numbers is always exact and does not signal any exception, except 
when an exception occurs due to the invalid operations as described in Section 3.3.6.1.1, 
“Invalid Operation Exception Condition.” 


3.3.1.7 Not a Numbers (NaNs) 


NaNs have the maximum biased exponent value and a nonzero fraction. The format for 
NaNs is shown in Figure 3-16. The sign bit of NaN does not show an algebraic sign; rather, 
it is simply another bit in the NaN. If the highest-order bit of the fraction field is a zero, the 
NaN is a signaling NaN; otherwise it is a quiet NaN (QNaN). 


EXPONENT = MAXIMUM FRACTION = ANY NONZERO 
(BIASED) BIT PATTERN 








SIGN BIT (ignored) 
Figure 3-16. Format for NaNs 


Signaling NaNs signal exceptions when they are specified as arithmetic operands. 


Quiet NaNs represent the results of certain invalid operations, such as attempts to perform 
arithmetic operations on infinities or NaNs, when the invalid operation exception is 
disabled (FPSCR[VE] = 0). Quiet NaNs propagate through all operations, except floating- 
point round to single-precision, ordered comparison, and conversion to integer operations, 
and signal exceptions only for ordered comparison and conversion to integer operations. 
Specific encodings in QNaNs can thus be preserved through a sequence of operations and 
used to convey diagnostic information to help identify results from invalid operations. 
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When a QNaN results from an operation because an operand is a NaN or because a QNaN 
is generated due to a disabled invalid operation exception, the following rule is applied to 
determine the QNaN to be stored as the result: 
If (frA) is a NaN 
Then frD ¢ (frA) 
Else if (frB) is a NaN 
Then if instruction is frsp 
Then fxD < (£xB) [0-34] || (29)0 
Else frD ¢ (£rB) 
Else if (frC) is a NaN 
Then frD <¢ (frC) 
Else if generated QNaN 
Then frD ¢ generated QNaN 


If the operand specified by frA is a NaN, that NaN is stored as the result. Otherwise, if the 
operand specified by frB is a NaN (if the instruction specifies an frB operand), that NaN is 
stored as the result, with the low-order 29 bits cleared. Otherwise, if the operand specified 
by frC is a NaN (if the instruction specifies an frC operand), that NaN is stored as the result. 
Otherwise, if a QNaN is generated by a disabled invalid operation exception, that QNaN is 
stored as the result. If a QNaN is to be generated as a result, the QNaN generated has a sign 
bit of zero, an exponent field of all ones, and a highest-order fraction bit of one with all 
other fraction bits zero. An instruction that generates a QNaWN as the result of a disabled 
invalid operation generates this QNaN. This is shown in Figure 3-17. 


| 114... 1000....0 


SIGN BIT (ignored) 





Figure 3-17. Representation of Generated QNaN 


3.3.2 Sign of Result 


The following rules govern the sign of the result of an arithmetic operation, when the 
operation does not yield an exception. These rules apply even when the operands or results 
are +0 or too: 


e The sign of the result of an addition operation is the sign of the source operand 
having the larger absolute value. If both operands have the same sign, the sign of the 
result of an addition operation is the same as the sign of the operands. The sign of 
the result of the subtraction operation, x — y, is the same as the sign of the result of 
the addition operation, x + (—y). 


¢ When the sum of two operands with opposite sign, or the difference of two operands 
with the same sign, is exactly zero, the sign of the result is positive in all rounding 
modes except round toward negative infinity (ce), in which case the sign is negative. 


¢ The sign of the result of a multiplication or division operation is the XOR of the 
signs of the source operands. 
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¢ The sign of the result of a round to single-precision or convert to/from integer 
operation is the sign of the source operand. 


¢ The sign of the result of a square root or reciprocal square root estimate operation is 
always positive, except that the square root of —0 is —0 and the reciprocal square root 
of —0 is —infinity. 


For multiply-add instructions, these rules are applied first to the multiplication operation 
and then to the addition or subtraction operation (one of the source operands to the addition 
or subtraction operation is the result of the multiplication operation). 


3.3.3 Normalization and Denormalization 


The intermediate result of an arithmetic or Floating Round to Single-Precision (frspx) 
instruction may require normalization and/or denormalization. When an intermediate result 
consists of a sign bit, an exponent, and a nonzero significand with a zero leading bit, the 
result must be normalized (and rounded) before being stored to the target. 


A number is normalized by shifting its significand left and decrementing its exponent by 
one for each bit shifted until the leading significand bit becomes one. The guard and round 
bits are also shifted, with zeros shifted into the round bit; see Section D.1, “Execution 
Model for IEEE Operations,” for information about the guard and round bits. During 
normalization, the exponent is regarded as if its range were unlimited. 


If an intermediate result has a nonzero significand and an exponent that is smaller than the 
minimum value that can be represented in the format specified for the result, this value is 
referred to as ‘tiny’ and the stored result is determined by the rules described in Section 
3.3.6.2.2, “Underflow Exception Condition.” These rules may involve denormalization. 
The sign of the number does not change. 


An exponent can become tiny in either of the following circumstances: 


¢ As the result of an arithmetic or Floating Round to Single-Precision (frspx) 
instruction or 


¢ As the result of decrementing the exponent in the process of normalization. 


Normalization is the process of coercing the leading significand bit to be a 1 while 
denormalization is the process of coercing the exponent into the target format's range. In 
denormalization, the significand is shifted to the right while the exponent is incremented 
for each bit shifted until the exponent equals the format’s minimum value. The result is then 
rounded. If any significand bits are lost due to the rounding of the shifted value, the result 
is considered inexact. The sign of the number does not change. 
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3.3.4 Data Handling and Precision 


There are specific instructions for moving floating-point data between the FPRs and 
memory. For double-precision format data, the data is not altered during the move. For 
single-precision data, the format is converted to double-precision format when data is 
loaded from memory into an FPR. A format conversion from double- to single-precision is 
performed when data from an FPR is stored as single-precision. These operations do not 
cause floating-point exceptions. 


All floating-point arithmetic, move, and select instructions use floating-point double- 
precision format. 


Floating-point single-precision formats are obtained by using the following four types of 
instructions: 


Load floating-point single-precision instructions—These instructions access a 
single-precision operand in single-precision format in memory, convert it to double- 
precision, and load it into an FPR. Floating-point exceptions do not occur during the 
load operation. 


Floating Round to Single-Precision (frspx) instruction—The frspx instruction 
rounds a double-precision operand to single-precision, checking the exponent for 
single-precision range and handling any exceptions according to respective enable 
bits in the FPSCR. The instruction places that operand into an FPR as a double- 
precision operand. For results produced by single-precision arithmetic instructions 
and by single-precision loads, this operation does not alter the value. 


Single-precision arithmetic instructions—These instructions take operands from the 
FPRs in double-precision format, perform the operation as if it produced an 
intermediate result correct to infinite precision and with unbounded range, and then 
force this intermediate result to fit in single-precision format. Status bits in the 
FPSCR and in the condition register are set to reflect the single-precision result. The 
result is then converted to double-precision format and placed into an FPR. The 
result falls within the range supported by the single-precision format. 


Source operands for these instructions must be representable in single-precision 
format. Otherwise, the result placed into the target FPR and the setting of status bits 
in the FPSCR, and in the condition register if update mode is selected, are undefined. 


Store floating-point single-precision instructions—These instructions convert a 
double-precision operand to single-precision format and store that operand into 
memory. If the operand requires denormalization in order to fit in single-precision 
format, it is automatically denormalized prior to being stored. No exceptions are 
detected on the store operation (the value being stored is effectively assumed to be 
the result of an instruction of one of the preceding three types). 


When the result of a Load Floating-Point Single (Ifs), Floating Round to Single-Precision 
(frspx), or single-precision arithmetic instruction is stored in an FPR, the low-order 29 
fraction bits are zero. This is shown in Figure 3-18. 
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Bit 35 





Figure 3-18. Single-Precision Representation in an FPR 


The frspx instruction allows conversion from double- to single-precision with appropriate 
exception checking and rounding. This instruction should be used to convert double- 
precision floating-point values (produced by double-precision load and arithmetic 
instructions) to single-precision values before storing them into single-format memory 
elements or using them as operands for single-precision arithmetic instructions. Values 
produced by single-precision load and arithmetic instructions can be stored directly, or used 
directly as operands for single-precision arithmetic instructions, without being preceded by 
an frspx instruction. 


A single-precision value can be used in double-precision arithmetic operations. The reverse 
is true only if the double-precision value can be represented in single-precision format. 
Some implementations may execute single-precision arithmetic instructions faster than 
double-precision arithmetic instructions. Therefore, if double-precision accuracy is not 
required, using single-precision data and instructions may speed operations in some 
implementations. 


3.3.5 Rounding 


All arithmetic, rounding, and conversion instructions defined by the PowerPC architecture 
(except the optional Floating Reciprocal Estimate Single (fresx) and Floating Reciprocal 
Square Root Estimate (frsqrtex) instructions) produce an intermediate result considered to 
be infinitely precise and with unbounded exponent range. This intermediate result is 
normalized or denormalized if required, and then rounded to the destination format. The 
final result is then placed into the target FPR in the double-precision format or in fixed-point 
format, depending on the instruction. 


The IEEE-754 specification allows loss of accuracy to be defined as when the rounded 
result differs from the infinitely precise value with unbounded range (same as the definition 
of ‘inexact’). In the PowerPC architecture, this is the way loss of accuracy is detected. 


Let Z be the intermediate arithmetic result (with infinite precision and unbounded range) or 
the operand of a conversion operation. If Z can be represented exactly in the target format, 
then the result in all rounding modes is exactly Z. If Z cannot be represented exactly in the 
target format, let Z1 and Z2 be the next larger and next smaller numbers representable in 
the target format that bound Z; then Z1 or Z2 can be used to approximate the result in the 
target format. 
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Figure 3-19 shows a graphical representation of Z, Z1, and Z2 in this case. 


By incrementing Isb of Z 
Infinitely precise value 
By truncating after Isb 







Z2 





0 


Negative values —_|_- Positive values 


Figure 3-19. Relation of Z1 and Z2 


Four rounding modes are available through the floating-point rounding control field (RN) 
in the FPSCR. See Section 2.1.4, “Floating-Point Status and Control Register (FPSCR).” 
These are encoded as follows in Table 3-8. 


Table 3-8. FPSCR Bit Settings—RN Field 
Round to nearest Choose the best approximation (Z1 or Z2). In case of a tie, 
choose the one that is even (least-significant bit 0). 


Round toward zero Choose the smaller in magnitude (Z1 or Z2). 
Round toward +infinity Choose Z1. 
Round toward —infinity Choose 22. 


See Section D.1, “Execution Model for IEEE Operations,” for a detailed explanation of 
rounding. Rounding occurs before an overflow condition is detected. This means that while 
an infinitely precise value with unbounded exponent range may be greater than the greatest 
representable value, the rounding mode may allow that value to be rounded to a 
representable value. In this case, no overflow condition occurs. 
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However, the underflow condition is tested before rounding. Therefore, if the value that is 
infinitely precise and with unbounded exponent range falls within the range of 
unrepresentable values, the underflow condition occurs. The results in these cases are 
defined in Section 3.3.6.2.2, “Underflow Exception Condition.” Figure 3-20 shows the 
selection of Z1 and Z2 for the four possible rounding modes that are provided by 
FPSCR[RN]. 


Z is infinitely precise 
result or operand 


Z fits otherwise 
target format 


| 22<Z<Z1| <Z<Z1]| per Figure 3-19 
( fD<z ) ( fD<z ) Z 
O 
F 





PSCR[RN] = 
otherwise (round toward 0) 
FPSCR[RN] = u Z>0 
(round toward —ce) otnenwise i‘ 
FPSCR[RN] = FPSCR[RN] = 10 


(round to Reueen (round toward +c) 


frD < Best approx (Z1 or Z2) frD< 21 
If tie, choose even (Z1 or Z2 w/ Isb 0) 


Figure 3-20. Selection of Z1 and Z2 for the Four Rounding Modes 





All arithmetic, rounding, and conversion instructions affect FPSCR bits FR and FI, 
according to whether the rounded result is inexact (FI) and whether the fraction was 
incremented (FR) as shown in Figure 3-21. If the rounded result is inexact, FI is set and FR 
may be either set or cleared. If rounding does not change the result, both FR and FI are 
cleared. The optional fresx and frsqrtex instructions set FI and FR to undefined values; 
other floating-point instructions do not alter FR and FI. 
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Zround iS rounded result 





otherwise Zround * Z 
FI<0 
FR<0O ; ? 
_ fraction otherwise 
incremented 


Figure 3-21. Rounding Flags in FPSCR 


3.3.6 Floating-Point Program Exceptions 


The computational instructions of the PowerPC architecture are the only instructions that 
can cause floating-point enabled exceptions (subsets of the program exception). In the 
processor, floating-point program exceptions are signaled by condition bits set in the 
floating-point status and control register (FPSCR) as described in this section and in 
Chapter 2, “PowerPC Register Set.” These bits correspond to those conditions identified as 
IEEE floating-point exceptions and can cause the system floating-point enabled exception 
error handler to be invoked. Handling for floating-point exceptions is described in 
Section 6.4.7, “Program Exception (0x00700).” 


The FPSCR is shown in Figure 3-22. 





[_] Reserved 
VXIDI VXZDZ VXSOFT 
VXIsI VXIMZ VXSQRT 
VXSNAN I ir VXVG [_ VXCVI 
FX |FEX]VX| Ox] UX| ZX| XX FRI FI| FPRF |0 VE|OE]UE|ZE] XE] NI] RN 
012 3 4 5 6 7 8 9 10 11 12 13 1415 19 20 21 22 23 24 25 26 27 28 29 30 31 


Figure 3-22. Floating-Point Status and Control Register (FPSCR) 
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A listing of FPSCR bit settings is shown in Table 3-9. 
Table 3-9. FPSCR Bit Settings 


Floating-point exception summary. Every floating-point instruction, except mtfsfi and mtfsf, 
implicitly sets FPSCR[FX] if that instruction causes any of the floating-point exception bits in 
the FPSCR to transition from 0 to 1. The merfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 
instructions can alter FPSCR[FX] explicitly. This is a sticky bit. 


Floating-point enabled exception summary. This bit signals the occurrence of any of the 
enabled exception conditions. It is the logical OR of all the floating-point exception bits 
masked by their respective enable bits (FEX = (VX & VE) * (OX & OE) * (UX & UE) * (ZX & 
ZE) * (XX & XE)). The merfs, mtfsf, mtfsfi, mtfsb0, and mtfsb1 instructions cannot alter 
FPSCR[FEX] explicitly. This is not a sticky bit. 


Floating-point invalid operation exception summary. This bit signals the occurrence of any 
invalid operation exception. It is the logical OR of all of the invalid operation exception bits as 
described in Section 3.3.6.1.1, “Invalid Operation Exception Condition.” The merfs, mifsf, 
mtfsfi, mtfsb0, and mtfsb1 instructions cannot alter FPSCR[VX] explicitly. This is not a sticky 
bit. 


Floating-point overflow exception. This is a sticky bit. See Section 3.3.6.2, “Overflow, 
Underflow, and Inexact Exception Conditions.” 


Floating-point underflow exception. This is a sticky bit. See Section 3.3.6.2.2, “Underflow 
Exception Condition.” 


Floating-point zero divide exception. This is a sticky bit. See Section 3.3.6.1.2, “Zero Divide 
Exception Condition.” 


Floating-point inexact exception. This is a sticky bit. See Section 3.3.6.2.3, “Inexact Exception 
Condition.” 
FPSCR[XX] is the sticky version of FPSCR[FI]. The following rules describe how FPSCR[XX] 
is set by a given instruction: 
+ If the instruction affects FPSCR[FI], the new value of FPSCR[XX] is obtained by logically 
ORing the old value of FPSCR[XX] with the new value of FPSCRI[FI]. 
« If the instruction does not affect FPSCR[FI], the value of FRSCR[XX] is unchanged. 


7 VXSNAN | Floating-point invalid operation exception for SNaN. This is a sticky bit. See Section 3.3.6.1.1, 
“Invalid Operation Exception Condition.” 
VXISI Floating-point invalid operation exception for co — 0. This is a sticky bit. See Section 3.3.6.1.1, 
“Invalid Operation Exception Condition.” 
VXIDI Floating-point invalid operation exception for co + 0. This is a sticky bit. See Section 3.3.6.1.1, 
“Invalid Operation Exception Condition.” 
10 VXZDZ Floating-point invalid operation exception for 0 + 0. This is a sticky bit. See Section 3.3.6.1.1, 
“Invalid Operation Exception Condition.” 
11 VXIMZ Floating-point invalid operation exception for < * 0. This is a sticky bit. See Section 3.3.6.1.1, 
“Invalid Operation Exception Condition.” 
12 VXVC Floating-point invalid operation exception for invalid compare. This is a sticky bit. See Section 
3.3.6.1.1, “Invalid Operation Exception Condition.” 


13 FR Floating-point fraction rounded. The last arithmetic, rounding, or conversion instruction 
incremented the fraction. See Section 3.3.5, “Rounding.” This bit is not sticky. 
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Table 3-9. FPSCR Bit Settings (Continued) 


14 Fl Floating-point fraction inexact. The last arithmetic, rounding, or conversion instruction either 
produced an inexact result during rounding or caused a disabled overflow exception. See 
Section 3.3.5, “Rounding.” This is not a sticky bit. For more information regarding the 
relationship between FPSCRI[FI] and FPSCR[XX], see the description of the FRSCR[XX] bit. 


15-19 | FPRF Floating-point result flags. For arithmetic, rounding, and conversion instructions the field is 
based on the result placed into the target register, except that if any portion of the result is 
undefined, the value placed here is undefined. 

15 Floating-point result class descriptor (C). Arithmetic, rounding, and conversion 
instructions may set this bit with the FPCC bits to indicate the class of the result as 
shown in Table 3-10. 

16-19 Floating-point condition code (FPCC). Floating-point compare instructions always 

set one of the FPCC bits to one and the other three FPCC bits to zero. Arithmetic, 
rounding, and conversion instructions may set the FPCC bits with the C bit to 
indicate the class of the result. Note that in this case the high-order three bits of the 
FPCC retain their relational significance indicating that the value is less than, 
greater than, or equal to zero. 
Floating-point less than or negative (FL or <) 
Floating-point greater than or positive (FG or >) 
Floating-point equal or zero (FE or =) 
Floating-point unordered or NaN (FU or ?) 

Note that these are not sticky bits. 


a 
21 VXSOFT | Floating-point invalid operation exception for software request. This is a sticky bit. This bit can 
be altered only by the merfs, mtfsfi, mtfsf, mtfsb0, or mtfsb1 instructions. For more detailed 
information, refer to Section 3.3.6.1.1, “Invalid Operation Exception Condition.” 
22 VXSQRT | Floating-point invalid operation exception for invalid square root. This is a sticky bit. For more 
detailed information, refer to Section 3.3.6.1.1, “Invalid Operation Exception Condition.” 
23 VXCVI Floating-point invalid operation exception for invalid integer convert. This is a sticky bit. See 
Section 3.3.6.1.1, “Invalid Operation Exception Condition.” 
24 VE Floating-point invalid operation exception enable. See Section 3.3.6.1.1, “Invalid Operation 
Exception Condition.” 
25 OE IEEE floating-point overflow exception enable. See Section 3.3.6.2, “Overflow, Underflow, and 
Inexact Exception Conditions.” 
IEEE floating-point underflow exception enable. See Section 3.3.6.2.2, “Underflow Exception 
Condition.” 


27 ZE IEEE floating-point zero divide exception enable. See Section 3.3.6.1.2, “Zero Divide 
Exception Condition.” 


Floating-point inexact exception enable. See Section 3.3.6.2.3, “Inexact Exception Condition.” 
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Table 3-9. FPSCR Bit Settings (Continued) 


Floating-point non-IEEE mode. If this bit is set, results need not conform with IEEE standards 
and the other FPSCR bits may have meanings other than those described here. If the bit is set 
and if all implementation-specific requirements are met and if an IEEE-conforming result of a 
floating-point operation would be a denormalized number, the result produced is zero 

(retaining the sign of the denormalized number). Any other effects associated with setting this 


bit are described in the user’s manual for the implementation. 
Effects of the setting of this bit are implementation-dependent. 


Floating-point rounding control. See Section 3.3.5, “Rounding.” 
00  ~=Round to nearest 
01 Round toward zero 

Round toward +infinity 

Round toward —infinity 





Table 3-10 illustrates the floating-point result flags used by PowerPC processors. The result 
flags correspond to FPSCR bits 15-19 (the FPRF field). 


Table 3-10. Floating-Point Result Flags — FPSCR[FPRF] 


Result Flags (Bits 15-19) 


Result Value Class 
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The following conditions that can cause program exceptions are detected by the processor. 
These conditions may occur during execution of computational floating-point instructions. 
The corresponding bits set in the FPSCR are indicated in parentheses: 


¢ Invalid operation exception condition (VX) 
— SNaN condition (VXSNAN) 
— Infinity — infinity condition (VXISD 
— Infinity + infinity condition (VXIDI 
— Zero + zero condition (VXZDZ) 
— Infinity * zero condition (VXIMZ) 
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— Invalid compare condition (VX VC) 

— Software request condition (VXSOFT) 

— Invalid integer convert condition (VXCVI) 
— Invalid square root condition (VXSQRT) 


These exception conditions are described in Section 3.3.6.1.1, “Invalid Operation 
Exception Condition.” 


¢ Zero divide exception condition (ZX). These exception conditions are described in 
Section 3.3.6.1.2, “Zero Divide Exception Condition.” 


¢ Overflow Exception Condition (OX). These exception conditions are described in 
Section 3.3.6.2.1, “Overflow Exception Condition.” 


¢ Underflow Exception Condition (UX). These exception conditions are described in 
Section 3.3.6.2.2, “Underflow Exception Condition.” 


¢ Inexact Exception Condition (XX). These exception conditions are described in 
Section 3.3.6.2.3, “Inexact Exception Condition.” 


Each floating-point exception condition and each category of invalid IEEE floating-point 
operation exception condition has a corresponding exception bit in the FPSCR which 
indicates the occurrence of that condition. Generally, the occurrence of an exception 
condition depends only on the instruction and its arguments (with one deviation, described 
below). When one or more exception conditions arise during the execution of an 
instruction, the way in which the instruction completes execution depends on the value of 
the IEEE floating-point enable bits in the FPSCR which govern those exception conditions. 
If no governing enable bit is set to 1, the instruction delivers a default result. Otherwise, 
specific condition bits and the FX bit in the FPSCR are set and instruction execution is 
completed by suppressing or delivering a result. Finally, after the instruction execution has 
completed, a nonzero FX bit in the FPSCR causes a program exception if either FEO or FE1 
is set in the MSR (invoking the system error handler). The values in the FPRs immediately 
after the occurrence of an enabled exception do not depend on the FEO and FE1 bits. 


The floating-point exception summary bit (FX) in the FPSCR is set by any floating-point 
instruction (except mtfsfi and mtfsf) that causes any of the exception bits in the FPSCR to 
change from 0 to 1, or by mtfsfi, mtfsf, and mtfsb1 instructions that explicitly set one of 
these bits. FPSCR[FEX] is set when any of the exception condition bits is set and the 
exception is enabled (enable bit is one). 


A single instruction may set more than one exception condition bit only in the following 


Cases: 


e The inexact exception condition bit (FPSCR[XX]) may be set with the overflow 
exception condition bit (FPSCR[OX]). 


¢ The inexact exception condition bit (FPSCR[XX]) may be set with the underflow 
exception condition bit (FPSCR[UX]). 
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¢ The invalid IEEE floating-point operation exception condition bit (SNaN) may be 
set with invalid IEEE floating-point operation exception condition bit (co*0) 
(FPSCR[VXIMZ]) for multiply-add instructions. 


¢ The invalid operation exception condition bit (SNaN) may be set with the invalid 
IEEE floating-point operation exception condition bit (invalid compare) 
(FPRSC[VXVC]) for compare ordered instructions. 


¢ The invalid IEEE floating-point operation exception condition bit (SNaN) may be 
set with the invalid IEEE floating-point operation exception condition bit (invalid 
integer convert) (FPSCR[VXCVI]) for convert-to-integer instructions. 


Instruction execution is suppressed for the following kinds of exception conditions, so that 
there is no possibility that one of the operands is lost: 

¢ Enabled invalid IEEE floating-point operation 

¢ Enabled zero divide 


For the remaining kinds of exception conditions, a result is generated and written to the 
destination specified by the instruction causing the exception condition. The result may 
depend on whether the condition is enabled or disabled. The kinds of exception conditions 
that deliver a result are the following: 

¢ Disabled invalid IEEE floating-point operation 

¢ Disabled zero divide 

¢ Disabled overflow 

¢ Disabled underflow 

¢ Disabled inexact 

¢ Enabled overflow 

¢ Enabled underflow 

¢ Enabled inexact 


Subsequent sections define each of the floating-point exception conditions and specify the 
action taken when they are detected. 


The IEEE standard specifies the handling of exception conditions in terms of traps and trap 
handlers. In the PowerPC architecture, an FPSCR exception enable bit being set causes 
generation of the result value specified in the IEEE standard for the trap enabled case—the 
expectation is that the exception is detected by software, which will revise the result. An 
FPSCR exception enable bit of 0 causes generation of the default result value specified for 
the trap disabled (or no trap occurs or trap is not implemented) case—the expectation is that 
the exception will not be detected by software, which will simply use the default result. The 
result to be delivered in each case for each exception is described in the following sections. 
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The IEEE default behavior when an exception occurs, which is to generate a default value 
and not to notify software, is obtained by clearing all FPSCR exception enable bits and 
using ignore exceptions mode (see Table 3-11). In this case the system floating-point 
enabled exception error handler is not invoked, even if floating-point exceptions occur. If 
necessary, software can inspect the FPSCR exception bits to determine whether exceptions 
have occurred. 


If the system error handler is to be invoked, the corresponding FPSCR exception enable bit 
must be set and a mode other than ignore exceptions mode must be used. In this case the 
system floating-point enabled exception error handler is invoked if an enabled floating- 
point exception condition occurs. 


Whether and how the system floating-point enabled exception error handler is invoked if an 
enabled floating-point exception occurs is controlled by MSR bits FEO and FE] as shown 
in Table 3-11. (The system floating-point enabled exception error handler is never invoked 
if the appropriate floating-point exception is disabled.) 


Table 3-11. MSR[FEO] and MSR[FE1] Bit Settings for FP Exceptions 
CS 
Ignore exceptions mode—Floating-point exceptions do not cause the program exception error 
handler to be invoked. 


1 Imprecise nonrecoverable mode—When an exception occurs, the exception handler is invoked at 
some point at or beyond the instruction that caused the exception. It may not be possible to identify 
the excepting instruction or the data that caused the exception. Results from the excepting instruction 
may have been used by or affected subsequent instructions executed before the exception handler 
was invoked. 


exception handler is invoked at some point at or beyond the instruction that caused the exception. 
Sufficient information is provided to the exception handler that it can identify the excepting instruction 
and correct any faulty results. In this mode, no results caused by the excepting instruction have been 
used by or affected subsequent instructions that are executed before the exception handler is 
invoked. 


Precise mode—The system floating-point enabled exception error handler is invoked precisely at the 
instruction that caused the enabled exception. 





a Imprecise recoverable mode— When an enabled exception occurs, the floating-point enabled 


In precise mode, whenever the system floating-point enabled exception error handler is 
invoked, the architecture ensures that all instructions logically residing before the excepting 
instruction have completed and no instruction after the excepting instruction has been 
executed. In an imprecise mode, the instruction flow may not be interrupted at the point of 
the instruction that caused the exception. The instruction at which the system floating-point 
exception handler is invoked has not been executed unless it is the excepting instruction and 
the exception is not suppressed. 


In either of the imprecise modes, an FPSCR instruction can be used to force the occurrence 
of any invocations of the floating-point enabled exception handler, due to instructions 
initiated before the FPSCR instruction. This forcing has no effect in ignore exceptions 
mode and is superfluous for precise mode. 
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Instead of using an FPSCR instruction, an execution synchronizing instruction or event can 
be used to force exceptions and set bits in the FPSCR; however, for the best performance 
across the widest range of implementations, an FPSCR instruction should be used to 
achieve these effects. 


For the best performance across the widest range of implementations, the following 
guidelines should be considered: 


¢ If IEEE default results are acceptable to the application, FEO and FE] should be 
cleared (ignore exceptions mode). All FPSCR exception enable bits should be 
cleared. 


¢ If IEEE default results are unacceptable to the application, an imprecise mode 
should be used with the FPSCR enable bits set as needed. 


¢ Ignore exceptions mode should not, in general, be used when any FPSCR exception 
enable bits are set. 


¢ Precise mode may degrade performance in some implementations, perhaps 
substantially, and therefore should be used only for debugging and other specialized 
applications. 


3.3.6.1 Invalid Operation and Zero Divide Exception Conditions 

The flow diagram in Figure 3-23 shows the initial flow for checking floating-point 
exception conditions (invalid operation and divide by zero conditions). In any of these cases 
of floating-point exception conditions, if the FPSCR[FEX] bit is set (implicitly) and 
MSR[FE0-FE1] # 00, the processor takes a program exception (floating-point enabled 
exception type). Refer to Chapter 6, “Exceptions,” for more information on exception 
processing. The actions performed for each floating-point exception condition are 
described in greater detail in the following sections. 
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Figure 3-23. Initial Flow for Floating-Point Exception Conditions 
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& Inexact Exception Conditions (see Figure 3-24) 
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3.3.6.1.1 Invalid Operation Exception Condition 
An invalid operation exception occurs when an operand is invalid for the specified 
operation. The invalid operations are as follows: 
e Any operation except load, store, move, select, or mtfsf on a signaling NaN (SNaN) 
¢ For add or subtract operations, magnitude subtraction of infinities (co — o) 
¢ Division of infinity by infinity (co + 00) 
¢ Division of zero by zero (0 + 0) 
¢ Multiplication of infinity by zero (c¢ * 0) 
¢ Ordered comparison involving a NaN (invalid compare) 


e Square root or reciprocal square root of a negative, nonzero number (invalid square 
root). Note that if the implementation does not support the optional floating-point 
square root or floating-point reciprocal square root estimate instructions, software 
can simulate the instruction and set the FRSCR[VXSQRT] bit to reflect the 
exception. 


¢ Integer convert involving a number that is too large in magnitude to be represented 
in the target format, or involving an infinity or a NaN (invalid integer convert) 


FPSCR[VXSOFT] allows software to cause an invalid operation exception for a condition 
that is not necessarily associated with the execution of a floating-point instruction. For 
example, it might be set by a program that computes a square root if the source operand is 
negative. This allows PowerPC instructions not implemented in hardware to be emulated. 


Any time an invalid operation occurs or software explicitly requests the exception via 
FPSCR[VXSOFT], (regardless of the value of FPSCR[VE]), the following actions are 
taken: 


¢ One or two invalid operation exception condition bits is set 


FPSCR[VXSNAN] (if SNaN) 
FPSCR[VXISJ] (if co — 00) 
FPSCR[VXIDI] (if 00 + 00) 
FPSCR[VXZDZ] (if 0 + 0) 
FPSCR[VXIMZ] (if co * 0) 
FPSCR[VXVC] (if invalid comparison) 
FPSCR[VXSOFT] (if software request) 
FPSCR[VXSQRT] (if invalid square root) 
FPSCR[VXCVI] (if invalid integer convert) 


¢ Ifthe operation is a compare, 
FPSCR[EFR, FI, C] are unchanged 
FPSCR[FPCC] is set to reflect unordered 


¢ If software explicitly requests the exception, 
FPSCR[FR, FI, FPRF] are as set by the mtfsfi, mtfsf, or mtfsb1 instruction. 
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There are additional actions performed that depend on the value of FPSCR[VE]. These are 
described in Table 3-12. 


Table 3-12. Additional Actions Performed for Invalid FP Operations 


Action Performed 
Invalid Operation Result Category 
FPSCR[VE] = 1 FPSCR[VE] = 0 

Arithmetic or floating-point round fr Unchanged 
to single 

FPSCR[FPRF] Set for QNaN Unchanged 
Convert to 64-bit integer frD[0-63] Unchanged Most positive 64-bit 
(positive number or +°°) integer value 

FPSCR[FPRF] Set for QNaN Undefined 
Convert to 64-bit integer frD[0-63] Unchanged Most negative 64-bit 
(negative number, NaN, or —°°) integer value 


FPSCR[FPRF] Set for QNaN Undefined 
Convert to 32-bit integer frD[0-31] Unchanged Undefined 


e 
(positive number or +°°) = 7 
frD[32-63] Unchanged Most positive 32-bit 
integer value 
FPSCR[FPRF] Set for QNaN Undefined 
Convert to 32-bit integer frD[0-31] Unchanged Undefined 
e 


(negative number, NaN, or —°°) ; : 
frD[32-63] Unchanged Most negative 32-bit 
integer value 


FPSCR[FPRF] Set for QNaN Undefined 


All cases FPSCR[FEX] Implicitly set Unchanged 
(causes exception) 


3.3.6.1.2 Zero Divide Exception Condition 

A zero divide exception condition occurs when a divide instruction is executed with a zero 
divisor value and a finite, nonzero dividend value or when an fres or frsqrte instruction is 
executed with a zero operand value. This exception condition indicates an exact infinite 
result from finite operands exception condition corresponding to a mathematical pole 
(divide or fres) or a branch point singularity (frsqrte). 
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When a zero divide condition occurs, the following actions are taken: 


¢ Zero divide exception condition bit is set FRSCR[ZX] = 1. 
¢ FPSCR[EFR, FI] are cleared. 


Additional actions depend on the setting of the zero divide exception condition enable bit, 
FPSCR[ZE], as described in Table 3-13. 


Table 3-13. Additional Actions Performed for Zero Divide 


Action Performed 
Result Category 
FPSCR[ZE] = 1 FPSCR[ZE] = 0 


Unchanged too (sign determined by XOR of the 
signs of the operands) 
FPSCR[FEX] Implicitly set (causes exception) Unchanged 





3.3.6.2 Overflow, Underflow, and Inexact Exception Conditions 

As described earlier, the overflow, underflow, and inexact exception conditions are detected 
after the floating-point instruction has executed and an infinitely precise result with 
unbounded range has been computed. Figure 3-24 shows the flow for the detection of these 
conditions and is a continuation of Figure 3-23. As in the cases of invalid operation, or zero 
divide conditions, if the FPSCR[FEX] bit is implicitly set as described in Table 3-9 and 
MSR[FE0O-FE1] # 00, the processor takes a program exception (floating-point enabled 
exception type). Refer to Chapter 6, “Exceptions,” for more information on exception 
processing. The actions performed for each of these floating-point exception conditions 
(including the generated result) are described in greater detail in the following sections. 
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Figure 3-24. Checking of Remaining Floating-Point Exception Conditions 
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3.3.6.2.1 Overflow Exception Condition 

Overflow occurs when the magnitude of what would have been the rounded result (had the 
exponent range been unbounded) is greater than the magnitude of the largest finite number 
of the specified result precision. Regardless of the setting of the overflow exception 
condition enable bit of the FPSCR, the following action is taken: 


¢ The overflow exception condition bit is set FPSCR[OX] = 1. 


Additional actions are taken that depend on the setting of the overflow exception condition 
enable bit of the FPSCR as described in Table 3-14. 


Table 3-14. Additional Actions Performed for Overflow Exception Condition 


Action Performed 


Condition Result Category 
| FPSCRIOEI=1 | 7 | FPSCRIOEI=1 | | FPSCRIOE]=0 | 


Double-precision Exponent of normalized | Adjusted by subtracting 1536 
arithmetic instructions | intermediate result 


Single-precision Exponent of normalized | Adjusted by subtracting 192 
arithmetic and frspx intermediate result 
instruction 


All cases Rounded result (with adjusted Default result per Table 3-15 
exponent) 


FPSCR[X Set if rounded result differs 
from intermediate result 
FPSCR[FEX] Implicitly set (causes Unchanged 
exception) 
FPSCR[FPRF] Set to indicate tnormal number | Set to indicate too or 
+normal number 


FPSCRI[F Reflects rounding 
FPSCRIF| Reflects rounding Se 








When the overflow exception condition is disabled (FPSCR[OE] = 0) and an overflow 
condition occurs, the default result is determined by the rounding mode bit (FPSCR[RN]) 
and the sign of the intermediate result as shown in Table 3-15. 
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Table 3-15. Target Result for Overflow Exception Disabled Case 


Round to nearest Positive 


Negative Format’s most negative finite number 


Round toward +infinity Positive +Infinity 
Negative Format’s most negative finite number 


Round toward —infinity Positive Format’s largest finite positive number 
Negative —Infinity 


3.3.6.2.2 Underflow Exception Condition 
The underflow exception condition is defined separately for the enabled and disabled states: 





¢ Enabled—Underflow occurs when the intermediate result is tiny. 


¢ Disabled—Underflow occurs when the intermediate result is tiny and the rounded 
result is inexact. 


In this context, the term ‘tiny’ refers to a floating-point value that is too small to be 
represented for a particular precision format. 


As shown in Figure 3-24, a tiny result is detected before rounding, when a nonzero 
intermediate result value computed as though it had infinite precision and unbounded 
exponent range is less in magnitude than the smallest normalized number. 


If the intermediate result is tiny and the underflow exception condition enable bit is cleared 
(FPSCR[UE] = 0), the intermediate result is denormalized (see Section 3.3.3, 
“Normalization and Denormalization”) and rounded (see Section 3.3.5, “Rounding”? 
before being stored in an FPR. In this case, if the rounding causes the delivered result value 
to differ from what would have been computed were both the exponent range and precision 
unbounded (the result is inexact), then underflow occurs and FPSCR[UX] is set. 
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The actions performed for underflow exception conditions are described in Table 3-16. 


Table 3-16. Actions Performed for Underflow Conditions 


Action Performed 
Condition Result Category 
FPSCR[UE] = 1 FPSCR[UE] = 0 


Double-precision Exponent of normalized Adjusted by adding 1536 
arithmetic instructions intermediate result 

Single-precision Exponent of normalized Adjusted by adding192 
arithmetic and frspx intermediate result 


instructions 
Rounded result (with Denormalized and 
adjusted exponent) rounded result 


FPSCR[XX] Set if rounded result Set if rounded result 
differs from intermediate differs from intermediate 
result result 

FPSCR[UX] Set Set only if tiny and inexact 

after denormalization and 
rounding 

FPSCR[FPRF] Set to indicate Set to indicate 
+normalized number +denormalized number or 

+zero 

FPSCR[FEX] Implicitly set (causes Unchanged 
exception) 


FPSCRI[FI] Reflects rounding Reflects rounding 
FPSCR[FR] Reflects rounding Reflects rounding 


Note that the FR and FI bits in the FPSCR allow the system floating-point enabled 
exception error handler, when invoked because of an underflow exception condition, to 
simulate a trap disabled environment. That is, the FR and FI bits allow the system floating- 
point enabled exception error handler to unround the result, thus allowing the result to be 
denormalized. 





3.3.6.2.3 Inexact Exception Condition 
The inexact exception condition occurs when one of two conditions occur during rounding: 
¢ The rounded result differs from the intermediate result assuming the intermediate 
result exponent range and precision to be unbounded. (In the case of an enabled 
overflow or underflow condition, where the exponent of the rounded result is 
adjusted for those conditions, an inexact condition occurs only if the significand of 
the rounded result differs from that of the intermediate result.) 


¢ The rounded result overflows and the overflow exception condition is disabled. 
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When an inexact exception condition occurs, the following actions are taken independently 
of the setting of the inexact exception condition enable bit of the FPSCR: 

¢ Inexact exception condition bit in the FPSCR is set FPSCR[XX] = 1. 

¢ The rounded or overflowed result is placed into the target FPR. 

¢ FPSCR[FPRF] is set to indicate the class and sign of the result. 


In addition, if the inexact exception condition enable bit in the FPSCR (FPSCR[XE)]) is set, 
and an inexact condition exists, then the FPSCR[FEX] bit is implicitly set, causing the 
processor to take a floating-point enabled program exception. 


In PowerPC implementations, running with inexact exception conditions enabled may have 
greater latency than enabling other types of floating-point exception conditions. 
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Chapter 4 
Addressing Modes and Instruction Set 
Summary 


This chapter describes instructions and addressing modes defined by the three levels of the vy 
PowerPC architecture—user instruction set architecture (UISA), virtual environment ,, 
architecture (VEA), and operating environment architecture (OEA). These instructions are 
divided into the following functional categories: 


¢ Integer instructions—These include arithmetic and logical instructions. For more 
information, see Section 4.2.1, “Integer Instructions.” 


¢ Floating-point instructions—These include floating-point arithmetic instructions, as 
well as instructions that affect the floating-point status and control register (FPSCR). 
For more information, see Section 4.2.2, “Floating-Point Instructions.” 


¢ Load and store instructions—These include integer and floating-point load and store 
instructions. For more information, see Section 4.2.3, “Load and Store Instructions.” 


¢ Flow control instructions—These include branching instructions, condition register 
logical instructions, trap instructions, and other instructions that affect the 
instruction flow. For more information, see Section 4.2.4, “Branch and Flow Control 
Instructions.” 


¢ Processor control instructions—These instructions are used for synchronizing 
memory accesses and managing of caches, TLBs, and the segment registers. For 
more information, see Section 4.2.5, “Processor Control Instructions—UISA,” 
Section 4.3.1, “Processor Control Instructions—VEA,” and Section 4.4.2, 
“Processor Control Instructions—OEA.” 


¢ Memory synchronization instructions—These instructions control the order in 
which memory operations are completed with respect to asynchronous events, and 
the order in which memory operations are seen by other processors or memory 
access mechanisms. For more information, see Section 4.2.6, “Memory 
Synchronization Instructions—UISA,” and Section 4.3.2, “Memory 
Synchronization Instructions—VEA.” 
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* Memory control instructions—These include cache management instructions (user- 
level and supervisor-level), segment register manipulation instructions, and 
translation lookaside buffer management instructions. For more information, see 
Section 4.3.3, “Memory Control Instructions—VEA,” and Section 4.4.3, “Memory 
Control Instructions—OEA.” (Note that user-level and supervisor-level are referred 
to as problem state and privileged state, respectively, in the architecture 
specification.) 


¢ External control instructions—These instructions allow a user-level program to 
communicate with a special-purpose device. For more information, see 
Section 4.3.4, “External Control Instructions.” 


This grouping of instructions does not necessarily indicate the execution unit that processes 
a particular instruction or group of instructions within a processor implementation. 


Integer instructions operate on byte, half-word, and word operands. Floating-point 
instructions operate on single-precision and double-precision floating-point operands. The 
PowerPC architecture uses instructions that are four bytes long and word-aligned. It 
provides for byte, half-word, and word operand fetches and stores between memory and a 
set of 32 general-purpose registers (GPRs). It also provides for word and double-word 
operand fetches and stores between memory and a set of 32 floating-point registers (FPRs). 
The FPRs are 64 bits wide in all PowerPC implementations. The GPRs are 32 bits wide in 
32-bit implementations and 64 bits wide in 64-bit implementations. 


Arithmetic and logical instructions do not read or modify memory. To use the contents of a 
memory location in a computation and then modify the same or another memory location, 
the memory contents must be loaded into a register, modified, and then written to the target 
location using load and store instructions. 


The description of each instruction includes the mnemonic and a formatted list of operands. 
PowerPC-compliant assemblers support the mnemonics and operand lists. To simplify 
assembly language programming, a set of simplified mnemonics (referred to as extended 
mnemonics in the architecture specification) and symbols is provided for some of the most 
frequently-used instructions; see Appendix F, “Simplified Mnemonics,” for a complete list 
of simplified mnemonics. 


The instructions are organized by functional categories while maintaining the delineation 
of the three levels of the PowerPC architecture—UISA, VEA, and OEA; Section 4.2 
discusses the UISA instructions, followed by Section 4.3 that discusses the VEA 
instructions and Section 4.4 that discusses the OEA instructions. See Section 1.1.2, “The 
Levels of the PowerPC Architecture,” for more information about the various levels defined 
by the PowerPC architecture. 
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4.1 Conventions 


This section describes conventions used for the PowerPC instruction set. Descriptions of y 
computation modes, memory addressing, synchronization, and the PowerPC exception 
summary follow. 


4.1.1 Sequential Execution Model 

The PowerPC processors appear to execute instructions in program order, regardless of 
asynchronous events or program exceptions. The execution of a sequence of instructions 
may be interrupted by an exception caused by one of the instructions in the sequence, or by 
an asynchronous event. (Note that the architecture specification refers to exceptions as 
interrupts.) 


For exceptions to the sequential execution model, refer to Chapter 6, “Exceptions.” For 
information about the synchronization required when using store instructions to access 
instruction areas of memory, refer to Section 4.2.3.3, “Integer Store Instructions,” and 
Section 5.1.5.2, “Instruction Cache Instructions.” For information regarding instruction 
fetching, and for information about guarded memory refer to Section 5.2.1.5, “The 
Guarded Attribute (G).” 


4.1.2 Computation Modes 
The PowerPC architecture allows for the following types of implementations: 
¢ 64-bit implementations, in which all general-purpose and floating-point registers, 

and some special-purpose registers (SPRs) are 64 bits long, and effective addresses 
are 64 bits long. All 64-bit implementations have two modes of operation: 64-bit 
mode (which is the default) and 32-bit mode. The mode controls how the effective 
address is interpreted, how condition bits are set, and how the count register (CTR) 
is tested by branch conditional instructions. All instructions provided for 64-bit 
implementations are available in both 64- and 32-bit modes. 


¢ 32-bit implementations, in which all registers except the FPRs are 32 bits long, and vu 
effective addresses are 32 bits long. 


This chapter describes only the instructions defined for 32-bit implementations. 
Instructions defined only for 64-bit implementations are illegal in 32-bit implementations, 
and vice versa. 


4.1.3 Classes of Instructions 

PowerPC instructions belong to one of the following three classes: 
* Defined 
* Tilegal 
e Reserved 
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Note that while the definitions of these terms are consistent among the PowerPC 
processors, the assignment of these classifications is not. For example, an instruction that 
is specific to 64-bit implementations is considered defined for 64-bit implementations but 
illegal for 32-bit implementations. 


The class is determined by examining the primary opcode, and the extended opcode if any. 
If the opcode, or the combination of opcode and extended opcode, is not that of a defined 
instruction or of a reserved instruction, the instruction is illegal. 


In future versions of the PowerPC architecture, instruction codings that are now illegal may 
become defined (by being added to the architecture) or reserved (by being assigned to one 
of the special purposes). Likewise, reserved instructions may become defined. 


4.1.3.1 Definition of Boundedly Undefined 


The results of executing a given instruction are said to be boundedly undefined if they could 
have been achieved by executing an arbitrary sequence of instructions, starting in the state 
the machine was in before executing the given instruction. Boundedly undefined results for 
a given instruction may vary between implementations, and between different executions 
on the same implementation. 


4.1.3.2 Defined Instruction Class 

Defined instructions contain all the instructions defined in the PowerPC UISA, VEA, and 
OEA. Defined instructions are guaranteed to be supported in all PowerPC implementations. 
The only exceptions are instructions that are defined only for 64-bit implementations, 
instructions that are defined only for 32-bit implementations, and optional instructions, as 
stated in the instruction descriptions in Chapter 8, “Instruction Set.’ A PowerPC processor 
may invoke the illegal instruction error handler (part of the program exception handler) 
when an unimplemented PowerPC instruction is encountered so that it may be emulated in 
software, as required. 


A defined instruction can have invalid forms, as described in Section 4.1.3.2.2, “Invalid 
Instruction Forms.” 


4.1.3.2.1 Preferred Instruction Forms 
A defined instruction may have an instruction form that is preferred (that is, the instruction 
will execute in an efficient manner). Any form other than the preferred form will take 
significantly longer to execute. The following instructions have preferred forms: 

¢ Load/store multiple instructions 

¢ Load/store string instructions 

¢ Or immediate instruction (preferred form of no-op) 
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4.1.3.2.2 Invalid Instruction Forms 


A defined instruction may have an instruction form that is invalid if one or more operands, 
excluding opcodes, are coded incorrectly in a manner that can be deduced by examining 
only the instruction encoding (primary and extended opcodes). Attempting to execute an 
invalid form of an instruction either invokes the illegal instruction error handler (a program 
exception) or yields boundedly-undefined results. See Chapter 8, “Instruction Set,’ for 
individual instruction descriptions. 


Invalid forms result when a bit or operand is coded incorrectly, for example, or when a 
reserved bit (shown as ‘0’) is coded as ‘1’. 


The following instructions have invalid forms identified in their individual instruction 
descriptions: 

¢ Branch conditional instructions 

¢ Load/store with update instructions 

¢ Load multiple instructions 

¢ Load string instructions 

¢ Integer compare instructions (in 32-bit implementations only) 

¢ Load/store floating-point with update instructions 


4.1.3.2.3 Optional Instructions 
A defined instruction may be optional. The optional instructions fall into the following 
categories: 
* General-purpose instructions—fsqrt and fsqrts 
* Graphics instructions—fres, frsqrte, and fsel 
e External control instructions—eciwx and ecowx V 
¢ Lookaside buffer management instructions—tlbia, tlbie, and tlbsyne (with 
conditions, see Chapter 8, “Instruction Set,” for more information) 


Note that the stfiwx instruction is defined as optional by the PowerPC architecture to ensure y 
backwards compatibility with earlier processors; however, it will likely be required for 
subsequent PowerPC processors. 


Also, note that additional categories may be defined in future implementations. If an 
implementation claims to support a given category, it implements all the instructions in that 
category. 


Any attempt to execute an optional instruction that is not provided by the implementation 
will cause the illegal instruction error handler to be invoked. Exceptions to this rule are 
stated in the instruction descriptions found in Chapter 8, “Instruction Set.” 


Chapter 4. Addressing Modes and Instruction Set Summary 4-5 


4.1.3.3 Illegal Instruction Class 
Illegal instructions can be grouped into the following categories: 


¢ Instructions that are not implemented in the PowerPC architecture. These opcodes 
are available for future extensions of the PowerPC architecture; that is, future 
versions of the PowerPC architecture may define any of these instructions to 
perform new functions. The following primary opcodes are defined as illegal but 
may be used in future extensions to the architecture: 


1, 4, 5, 6, 56, 57, 60, 61 


¢ Instructions that are implemented in the PowerPC architecture but are not 
implemented in a specific PowerPC implementation. For example, instructions 
specific to 64-bit PowerPC processors are illegal for 32-bit processors. 


The following primary opcodes are defined for 64-bit implementations only and are 
illegal on 32-bit implementations: 


2, 30, 58, 62 


¢ All unused extended opcodes are illegal. The unused extended opcodes can be 
determined from information in Section A.2, “Instructions Sorted by Opcode,” and 
Section 4.1.3.4, “Reserved Instructions.” Notice that extended opcodes for 
instructions that are defined only for 64-bit implementations are illegal in 32-bit 
implementations. The following primary opcodes have unused extended opcodes. 


19, 31, 59, 63 (primary opcodes 30 and 62 are illegal for 32-bit implementations, but 
as 64-bit opcodes they have some unused extended opcodes) 


e An instruction consisting entirely of zeros is guaranteed to be an illegal instruction. 
This increases the probability that an attempt to execute data or uninitialized 
memory invokes the illegal instruction error handler (a program exception). Note 
that if only the primary opcode consists of all zeros, the instruction is considered a 
reserved instruction, as described in Section 4.1.3.4, “Reserved Instructions.” 


An attempt to execute an illegal instruction invokes the illegal instruction error handler (a 
program exception) but has no other effect. See Section 6.4.7, “Program Exception 
(0x00700),” for additional information about illegal instruction exception. 


With the exception of the instruction consisting entirely of binary zeros, the illegal 
instructions are available for further additions to the PowerPC architecture. 


4.1.3.4 Reserved Instructions 

Reserved instructions are allocated to specific implementation-dependent purposes not 
defined by the PowerPC architecture. An attempt to execute an unimplemented reserved 
instruction invokes the illegal instruction error handler (a program exception). See 
Section 6.4.7, “Program Exception (0x00700),” for additional information about illegal 
instruction exception. 
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The following types of instructions are included in this class: 


1. Instructions for the POWER architecture that have not been included in the 
PowerPC architecture. 


2. Implementation-specific instructions used to conform to the PowerPC 
architecture specifications (for example, Load Data TLB Entry (tlbld) and 
Load Instruction TLB Entry (tIbli) instructions for the PowerPC 603™ 
microprocessor). 


3. The instruction with primary opcode 0, when the instruction does not consist 
entirely of binary zeros 


4. Any other implementation-specific instructions that are not defined in the UISA, 
VEA, or OEA 


4.1.4 Memory Addressing 


A program references memory using the effective (logical) address computed by the 
processor when it executes a load, store, branch, or cache instruction, and when it fetches 
the next sequential instruction. 


4.1.4.1 Memory Operands 


Bytes in memory are numbered consecutively starting with zero. Each number is the 
address of the corresponding byte. 


Memory operands may be bytes, half words, words, or double words, or, for the load/store 
multiple and load/store string instructions, a sequence of bytes or words. The address of a 
memory operand is the address of its first byte (that is, of its lowest-numbered byte). 
Operand length is implicit for each instruction. The PowerPC architecture supports both 
big-endian and little-endian byte ordering. The default byte and bit ordering is big-endian; 
see Section 3.1.2, “Byte Ordering,” for more information. 


The operand of a single-register memory access instruction has a natural alignment 
boundary equal to the operand length. In other words, the “natural” address of an operand 
is an integral multiple of the operand length. A memory operand is said to be aligned if it 
is aligned at its natural boundary; otherwise it is misaligned. For a detailed discussion about 
memory operands, see Chapter 3, “Operand Conventions.” 


4.1.4.2 Effective Address Calculation 


An effective address (EA) is the 32-bit sum computed by the processor when executing a 
memory access or branch instruction or when fetching the next sequential instruction. For 
a memory access instruction, if the sum of the effective address and the operand length 
exceeds the maximum effective address, the memory operand is considered to wrap around 
from the maximum effective address through effective address 0, as described in the 
following paragraphs. 


Effective address computations for both data and instruction accesses use 32-bit unsigned 
binary arithmetic. A carry from bit 0 is ignored. 
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In all implementations (including 32-bit mode in 64-bit implementations), the three low- 
order bits of the calculated effective address may be modified by the processor before 
accessing memory if the PowerPC system is operating in little-endian mode. See 
Section 3.1.2, “Byte Ordering,” for more information about little-endian mode. 


Load and store operations have three categories of effective address generation that depend 
on the operands specified: 

¢ Register indirect with immediate index mode 

¢ Register indirect with index mode 

¢ Register indirect mode 


See Section 4.2.3.1, “Integer Load and Store Address Generation,” for a detailed 
description of effective address generation for load and store operations. 


Branch instructions have three categories of effective address generation: 


¢ Immediate addressing. 
¢ Link register indirect 
¢ Count register indirect 


See Section 4.2.4.1, “Branch Instruction Address Calculation,” for a detailed 
description of effective address generation for branch instructions. 


Branch instructions can optionally load the LR with the next sequential instruction address 
(current instruction address + 4). 


4.1.5 Synchronizing Instructions 


The synchronization described in this section refers to the state of activities within the 
processor that is performing the synchronization. Refer to Section 6.1.2, 
“Synchronization,” for more detailed information about other conditions that can cause 
context and execution synchronization. 


4.1.5.1 Context Synchronizing Instructions 

The System Call (sc), Return from Interrupt (rfi), and Instruction Synchronize (isync) 
instructions perform context synchronization by allowing previously issued instructions to 
complete before performing a context switch. Execution of one of these instructions 
ensures the following: 


1. No higher priority exception exists (sc) and instruction dispatching is halted. 


2. All previous instructions have completed to a point where they can no longer cause 
an exception. 


If a prior memory access instruction causes one or more direct-store interface error 
exceptions, the results are guaranteed to be determined before this instruction is 
executed. However, note that the direct-store facility is being phased out of the 
architecture and will not likely be supported in future devices. 
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3. Previous instructions complete execution in the context (privilege, protection, and 
address translation) under which they were issued. 


4. The instructions following the sc, rfi, or isyne instruction execute in the context 
established by these instructions. 


4.1.5.2 Execution Synchronizing Instructions 


An instruction is execution synchronizing if it satisfies the conditions of the first two items 
described above for context synchronization. The sync instruction is treated like isyne with 
respect to the second item described above (that is, the conditions described in the second 
item apply to the completion of sync). The syne and mtmsr instructions are examples of 
execution-synchronizing instructions. 


All context-synchronizing instructions are execution-synchronizing. Unlike a context 
synchronizing operation, an execution synchronizing instruction need not ensure that the 
instructions following it execute in the context established by that instruction. This new 
context becomes effective sometime after the execution synchronizing instruction 
completes and before or at a subsequent context synchronizing operation. 


4.1.6 Exception Summary 


PowerPC processors have an exception mechanism for handling system functions and error 
conditions in an orderly way. The exception model is defined by the OEA. There are two 
kinds of exceptions—those caused directly by the execution of an instruction and those 
caused by an asynchronous event. Either may cause components of the system software to 
be invoked. 


Exceptions can be caused directly by the execution of an instruction as follows: 


e An attempt to execute an illegal instruction causes the illegal instruction (program 
exception) error handler to be invoked. An attempt by a user-level program to 
execute the supervisor-level instructions listed below causes the privileged 
instruction (program exception) handler to be invoked. 


The PowerPC architecture provides the following supervisor-level instructions: 
dcbi, mfmsr, mfspr, mfsr, mfsrin, mtmsr, mtspr, mtsr, mtsrin, rfi, tibia, tlbie, 
and tlbsync (defined by OEA). Note that the privilege level of the mfspr and mtspr 
instructions depends on the SPR encoding. 


¢ The execution of a defined instruction using an invalid form causes either the illegal 
instruction error handler or the privileged instruction handler to be invoked. 


¢ The execution of an optional instruction that is not provided by the implementation 
causes the illegal instruction error handler to be invoked. 


¢ An attempt to access memory in a manner that violates memory protection, or an 
attempt to access memory that is not available (page fault), causes the DSI exception 
handler or ISI exception handler to be invoked. 
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e An attempt to access memory with an effective address alignment that is invalid for 
the instruction causes the alignment exception handler to be invoked. 


¢ The execution of an sc instruction permits a program to call on the system to perform 
a service, by causing a system call exception handler to be invoked. 


¢ The execution of a trap instruction invokes the program exception trap handler. 


¢ The execution of a floating-point instruction when floating-point instructions are 
disabled invokes the floating-point unavailable exception handler. 


¢ The execution of an instruction that causes a floating-point exception that is enabled 
invokes the floating-point enabled exception handler. 


¢ The execution of a floating-point instruction that requires system software assistance 
causes the floating-point assist exception handler to be invoked. The conditions 
under which such software assistance is required are implementation-dependent. 


Exceptions caused by asynchronous events are described in Chapter 6, “Exceptions.” 


4.2 PowerPC UISA Instructions 


The PowerPC user instruction set architecture (UISA) includes the base user-level 
instruction set (excluding a few user-level cache-control, synchronization, and time base 
instructions), user-level registers, programming model, data types, and addressing modes. 
This section discusses the instructions defined in the UISA. 


4.2.1 Integer Instructions 
The integer instructions consist of the following: 


¢ Integer arithmetic instructions 

¢ Integer compare instructions 

¢ Integer logical instructions 

¢ Integer rotate and shift instructions 


Integer instructions use the content of the GPRs as source operands and place results into 
GPRs. Integer arithmetic, shift, rotate, and string move instructions may update or read 
values from the XER, and the condition register (CR) fields may be updated if the Rc bit of 
the instruction is set. 


These instructions treat the source operands as signed integers unless the instruction is 
explicitly identified as performing an unsigned operation. For example, Multiply High- 
Word Unsigned (mulhwu) and Divide Word Unsigned (divwu) instructions interpret both 
operands as unsigned integers. 


The integer instructions that are coded to update the condition register, and the integer 
arithmetic instruction, addic., set CR bits 0-3 (CRO) to characterize the result of the 
operation. CRO is set to reflect a signed comparison of the result to zero. 
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The integer arithmetic instructions, addic, addic., subfic, addc, subfc, adde, subfe, 
addme, subfme, addze, and subfze, always set the XER bit, CA, to reflect the carry out of 
bit 0. Integer arithmetic instructions with the overflow enable (OE) bit set in the instruction 
encoding (instructions with o suffix) cause the XER[SO] and XER[OV] to reflect an 
overflow of the result. Except for the multiply low and divide instructions, these integer 
arithmetic instructions reflect the overflow of the result. 


Instructions that select the overflow option (enable XER[OV)]) or that set the XER carry bit 
(CA) may delay the execution of subsequent instructions. 


Unless otherwise noted, when CRO and the XER are set, they reflect the value placed in the 
target register. 


4.2.1.1 Integer Arithmetic Instructions 
Table 4-1 lists the integer arithmetic instructions for the PowerPC processors. 
Table 4-1. Integer Arithmetic Instructions 


Operand 
ee 
Add Immediate addi |r rD,rA,SIMM | The sum (rA|0) + SIMM is placed into rD. 


Add Immediate rD,rA,SIMM | The sum (rA|0) + (SIMM || 0x0000) is placed into rD. 
Shifted 


rD,rA,rB The sum (rA) + (rB) is placed into rD. 

add Add 

add. Add with CR Update. The dot suffix enables the update of the 
CR. 

addo Add with Overflow Enabled. The o suffix enables the overflow 
bit (OV) in the XER. 

addo. Add with Overflow and CR Update. The o. suffix enables the 
update of the CR and enables the overflow bit (OV) in the 
XER. 


rD,rA,rB The sum - (rA) + (rB) +1 is placed into rD. 

subf Subtract From 

subf. Subtract from with CR Update. The dot suffix enables the 
update of the CR. 

subfo Subtract from with Overflow Enabled. The o suffix enables the 
overflow bit (OV) in the XER. 

subfo. Subtract from with Overflow and CR Update. The o. suffix 
enables the update of the CR and enables the overflow bit 


(OV) in the XER. 


rD,rA,SIMM | The sum (rA) + SIMM is placed into rD. 


Subtract From 


Add Immediate 

Carrying 

Add Immediate rD,rA,SIMM | The sum (rA) + SIMM is placed into rD. The CR is updated. 
Carrying and 

Record 


Subtract from 
Immediate 
Carrying 





rD,rA,SIMM | The sum - (rA) + SIMM + 1 is placed into rD. 
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Table 4-1. Integer Arithmetic Instructions (Continued) 


Operand ‘ 


Add Carrying rD,rA,rB The sum (rA) + (rB) is placed into rD. 

addc Add Carrying 

addc. Add Carrying with CR Update. The dot suffix enables the 
update of the CR. 

addco —_ Add Carrying with Overflow Enabled. The o suffix enables the 
overflow bit (OV) in the XER. 

addco. Add Carrying with Overflow and CR Update. The o. suffix 
enables the update of the CR and enables the overflow bit 


(OV) in the XER. 


Subtract from rD,rA,rB The sum - (rA) + (rB) + 1 is placed into rD. 
Carrying subfc Subtract from Carrying 
subfc. Subtract from Carrying with CR Update. The dot suffix 
enables the update of the CR. 
subfco Subtract from Carrying with Overflow. The o suffix enables the 
overflow bit (OV) in the XER. 
subfco. Subtract from Carrying with Overflow and CR Update. The o. 


suffix enables the update of the CR and enables the overflow 
bit (OV) in the XER. 


rD,rA,rB The sum (rA) + (rB) + XER[CA] is placed into rD. 

adde Add Extended 

adde. Add Extended with CR Update. The dot suffix enables the 
update of the CR. 

addeo Add Extended with Overflow. The o suffix enables the 
overflow bit (OV) in the XER. 

addeo. Add Extended with Overflow and CR Update. The o. suffix 
enables the update of the CR and enables the overflow bit 


(OV) in the XER. 


Add 
Extended 


Subtract from rD,rA,rB The sum - (rA) + (rB) + XER[CA] is placed into rD. 
exended subfe Subtract from Extended 
subfe. Subtract from Extended with CR Update. The dot suffix 
enables the update of the CR. 
subfeo = Subtract from Extended with Overflow. The o suffix enables 
the overflow bit (OV) in the XER. 
subfeo. Subtract from Extended with Overflow and CR Update. The o. 


suffix enables the update of the CR and enables the overflow 
(OV) bit in the XER. 


Add to Minus rD,rA The sum (rA) + XER[CA] added to OxFFFF_FFFF is placed into rD. 
one-eaended addme Add to Minus One Extended 
addme. Add to Minus One Extended with CR Update. The dot suffix 
enables the update of the CR. 
addmeo_ Add to Minus One Extended with Overflow. The o suffix 
enables the overflow bit (OV) in the XER. 
addmeo. Add to Minus One Extended with Overflow and CR Update. 


The o. suffix enables the update of the CR and enables the 
overflow (OV) bit in the XER. 
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Table 4-1. Integer Arithmetic Instructions (Continued) 


Subtract from The sum - (rA) + XER[CA] added to OxFFFF_FFFF is placed into rD. 
Minus One 


subfme Subtract from Minus One Extended 
Extended 


subfme. Subtract from Minus One Extended with CR Update. The dot 
suffix enables the update of the CR. 

subfmeo Subtract from Minus One Extended with Overflow. The o suffix 
enables the overflow bit (OV) in the XER. 

subfmeo. Subtract from Minus One Extended with Overflow and CR 
Update. The o. suffix enables the update of the CR and 
enables the overflow bit (OV) in the XER. 


Add to Zero rD,rA The sum (rA) + XER[CA] is placed into rD. 
Pxended : addze Add to Zero Extended 
addze. Add to Zero Extended with CR Update. The dot suffix enables 
: the update of the CR. 
addzeo Add to Zero Extended with Overflow. The o suffix enables the 
overflow bit (OV) in the XER. 
addzeo. Add to Zero Extended with Overflow and CR Update. The o. 


suffix enables the update of the CR and enables the overflow 
bit (OV) in the XER. 


The o. suffix enables the update of the CR and enables the 


Subtract from rD,rA The sum - (rA) + XER[CA] is placed into rD. 
Zoro Extended ‘ subfze Subtract from Zero Extended 
subfze. Subtract from Zero Extended with CR Update. The dot suffix 
. enables the update of the CR. 
subfzeo Subtract from Zero Extended with Overflow. The o suffix 
enables the overflow bit (OV) in the XER. 
subfzeo. Subtract from Zero Extended with Overflow and CR Update. 
overflow bit (OV) in the XER. 


rD,rA The sum - (rA) + 1 is placed into rD. 


neg Negate 

neg. Negate with CR Update. The dot suffix enables the update of 
the CR. 

nego Negate with Overflow. The o suffix enables the overflow bit 
(OV) in the XER. 

nego. Negate with Overflow and CR Update. The o. suffix enables 
the update of the CR and enables the overflow bit (OV) in the 
XER. 


Multiply Low rD,rA,SIMM | The low-order 32 bits of the product (rA) * SIMM are placed into rD. 
painediae This instruction can be used with mulhdx or mulhwx to calculate a full 
64-bit product. 
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Table 4-1. Integer Arithmetic Instructions (Continued) 


Operand ‘ 
tame nmemonic| am’ | avon 


Multiply Low rD,rA,rB The 32-bit product (rA) * (rB) is placed into register rD. 


This instruction can be used with mulhwx to calculate a full 64-bit 
product. 


mullw Multiply Low 

mullw. — Multiply Low with CR Update. The dot suffix enables the 
update of the CR. 

mullwo = Multiply Low with Overflow. The o suffix enables the overflow 
bit (OV) in the XER. 

mullwo. Multiply Low with Overflow and CR Update. The o. suffix 
enables the update of the condition register and enables the 
overflow bit (OV) in the XER. 


Multiply High rD,rA,rB The contents of rA and rB are interpreted as 32-bit signed integers. The 
Word 64-bit product is formed. The high-order 32 bits of the 64-bit product are 
placed into rD. 


mulhw = Multiply High Word 
mulhw. Multiply High Word with CR Update. The dot suffix enables 
the update of the CR. 


Multiply High The contents of rA and of rB are interpreted as 32-bit unsigned integers. 
Word Unsigned The 64-bit product is formed. The high-order 32 bits of the 64-bit product 
are placed into rD. 


mulhwu = Multiply High Word Unsigned 
mulhwu. Multiply High Word Unsigned with CR Update. The dot suffix 
enables the update of the CR. 


Divide Word The dividend is the signed value of rA. The divisor is the signed value of 
rB. The quotient is placed into rD. The remainder is not supplied as a 
result. 


divw Divide Word 

divw. Divide Word with CR Update. The dot suffix enables the update 
of the CR. 

divwo Divide Word with Overflow. The 0 suffix enables the overflow bit 
(OV) in the XER. 

divwo. Divide Word with Overflow and CR Update. The o. suffix enables 
the update of the CR and enables the overflow bit (OV) in the 
XER. 


Divide Word rD,rA,rB The dividend is the zero-extended value in rA. The divisor is the zero- 
Unsigned extended value in rB. The quotient is placed into rD. The remainder is not 
supplied as a result. 


divwu Divide Word Unsigned 

divwu. —_ Divide Word Unsigned with CR Update. The dot suffix enables 
the update of the CR. 

divwuo Divide Word Unsigned with Overflow. The o suffix enables the 
overflow bit (OV) in the XER. 

divwuo. Divide Word Unsigned with Overflow and CR Update. The o. 
suffix enables the update of the CR and enables the overflow 
bit (OV) in the XER. 
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Although there is no “Subtract Immediate” instruction, its effect can be achieved by using 
an addi instruction with the immediate operand negated. Simplified mnemonics are 
provided that include this negation. The subf instructions subtract the second operand (rA) 
from the third operand (rB). Simplified mnemonics are provided in which the third operand 
is subtracted from the second operand. See Appendix F, “Simplified Mnemonics,” for 
examples. 


4.2.1.2 Integer Compare Instructions 

The integer compare instructions algebraically or logically compare the contents of register 
rA with either the zero-extended value of the UIMM operand, the sign-extended value of 
the SIMM operand, or the contents of register rB. The comparison is signed for the empi 
and cmp instructions, and unsigned for the cmpli and cmpl instructions. Table 4-2 
summarizes the integer compare instructions. 


Appendix F, “Simplified MnemonicsFor 32-bit implementations, the L field must be 
cleared, otherwise the instruction form is invalid. 


The integer compare instructions (shown in Table 4-2) set one of the leftmost three bits of 
the designated CR field, and clear the other two. XER[SO] is copied into bit 3 of the CR 
field. 


Table 4-2. Integer Compare Instructions 


Compare crfD,L,rA,SIMM_ | The value in register rA is compared with the sign-extended value of 

Immediate the SIMM operand, treating the operands as signed integers. The 
result of the comparison is placed into the CR field specified by 
operand erfD. 


Compare erfD,L,rA,rB The value in register rA is compared with the value in register rB, 
treating the operands as signed integers. The result of the comparison 
is placed into the CR field specified by operand erfD. 


Compare crfD,L,rA,UIMM | The value in register rA is compared with 0x0000 || UIMM, treating the 
Logical operands as unsigned integers. The result of the comparison is placed 
Immediate into the CR field specified by operand crfD. 


Compare erfD,L,rA,rB The value in register rA is compared with the value in register rB, 
Logical treating the operands as unsigned integers. The result of the 
comparison is placed into the CR field specified by operand erfD. 


The erfD operand can be omitted if the result of the comparison is to be placed in CRO. 
Otherwise the target CR field must be specified in the instruction erfD field, using an 
explicit field number. 





For information on simplified mnemonics for the integer compare instructions see 
Appendix F, “Simplified Mnemonics.” 
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4.2.1.3 Integer Logical Instructions 

The logical instructions shown in Table 4-3 perform bit-parallel operations on 32-bit 
operands. Logical instructions with the CR updating enabled (uses dot suffix) and 
instructions andi. and andis. set CR field CRO (bits 0 to 2) to characterize the result of the 
logical operation. Logical instructions without CR update and the remaining logical 
instructions do not modify the CR. Logical instructions do not affect the XER[SO], 
XER[OV], and XER[CA] bits. 


See Appendix F, “Simplified Mnemonics,” for simplified mnemonic examples for integer 
logical operations. 


Table 4-3. Integer Logical Instructions 
Operand i 
mame _|unaenc) Grit | penton 
AND rA,rS,UIMM_ |The contents of rS are ANDed with 0x0000 || UIMM and the result is placed 
Immediate into rA. 
The CR is updated. 


rA,rS,UIMM_ |The content of rS are ANDed with UIMM || 0x0000 and the result is placed 
Immediate into rA. 
Shifted The CR is updated. 


OR rA,rS,UIMM |The contents of rS are ORed with 0x0000 || UIMM and the result is placed 
Immediate into rA. 
The preferred no-op is ori 0,0,0 


rA,rS,UIMM_ |The contents of rS are ORed with UIMM || 0x0000 and the result is placed 
Immediate i 
Shifted 


rA,rS,UIMM_ |The contents of rS are XORed with 0x0000 || UIMM and the result is placed 
fica into rA. 


rA,rS,UIMM The contents of rS are XORed with UIMM || 0x0000 and the result is placed 
inneaas - 
Shifted 
rArS,rB The contents of rS are ANDed with the contents of register rB and the result 
is placed into rA. 
and AND 
and. AND with CR Update. The dot suffix enables the update of the CR. 
rArS,rB The contents of rS are ORed with the contents of rB and the result is placed 
into rA. 
OR 
4 OR with CR Update. The dot suffix enables the update of the CR. 
The contents of rS are XORed with the contents of rB and the result is 


placed into rA. 


XOR 
XOR with CR Update. The dot suffix enables the update of the CR. 
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Table 4-3. Integer Logical Instructions (Continued) 


Operand 7 
NAND nand rA,rS,rB The contents of rS are ANDed with the contents of rB and the one’s 
nand. complement of the result is placed into rA. 

nand NAND 

nand. NAND with CR Update. The dot suffix enables the update of CR. 

Note that nandx, with rS = rB, can be used to obtain the one's complement. 
NOR rA,rS,rB The contents of rS are ORed with the contents of rB and the one’s 
complement of the result is placed into rA. 
nor NOR 
nor. NOR with CR Update. The dot suffix enables the update of the CR. 
Note that norx, with rS = rB, can be used to obtain the one's complement. 
Equivalent |eqv rA,rS,rB The contents of rS are XORed with the contents of rB and the 
eqv. complemented result is placed into rA. 
eqv Equivalent 
eqv. Equivalent with CR Update. The dot suffix enables the update of 
the CR. 


AND with andc rA,rS,rB The contents of rS are ANDed with the one’s complement of the contents of 
Complement | andc. rB and the result is placed into rA. 
ande AND with Complement 
ande. AND with Complement with CR Update. The dot suffix enables the 
update of the CR. 
OR with rArS,rB The contents of rS are ORed with the complement of the contents of rB and 
Complement the result is placed into rA. 
orc OR with Complement 
ore. OR with Complement with CR Update. The dot suffix enables the 
update of the CR. 
Extend Sign | extsb rArS The contents of the low-order eight bits of rS are placed into the low-order 
Byte extsb. eight bits of rA. Bit 24 of rS is placed into the remaining high-order bits of 
rA. 
extsb Extend Sign Byte 
extsb. Extend Sign Byte with CR Update. The dot suffix enables the 
update of the CR. 


rAjrS The contents of the low-order 16 bits of rS are placed into the low-order 16 
bits of rA. Bit 16 of rS is placed into the remaining high-order bits of rA. 
extsh Extend Sign Half Word 
extsh. Extend Sign Half Word with CR Update. The dot suffix enables the 


a update of the CR. 
Count rArS A count of the number of consecutive zero bits starting at bit 0 of rS is 
Leading placed into rA. This number ranges from 0 to 32, inclusive. 
Zeige Wry If Re = 1 (dot suffix), LT is cleared in CRO. 
entlzw Count Leading Zeros Word 
entlzw. Count Leading Zeros Word with CR Update. The dot suffix enables 
the update of the CR. 
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Extend Sign 
Half Word 





4.2.1.4 Integer Rotate and Shift Instructions 

Rotation operations are performed on data from a GPR, and the result, or a portion of the 
result, is returned to a GPR. The rotation operations rotate a 32-bit quantity left by a 
specified number of bit positions. Bits that exit from position 0 enter at position 31. 


The rotate and shift instructions employ a mask generator. The mask is 32 bits long and 
consists of ‘1’ bits from a start bit, Mstart, through and including a stop bit, Mstop, and ‘0’ 
bits elsewhere. The values of Mstart and Mstop range from 0 to 31. If Mstart > Mstop, the 
‘1’ bits wrap around from position 31 to position 0. Thus the mask is formed as follows: 


if Mstart < Mstop then 


mask[mstart—mstop] = ones 

mask[all other bits] = zeros 
else 

mask[mstart—31] = ones 

mask[0—mstop] = ones 

mask[all other bits] = zeros 


It is not possible to specify an all-zero mask. The use of the mask is described in the 
following sections. 


If CR updating is enabled, rotate and shift instructions set CRO[0-2] according to the 
contents of rA at the completion of the instruction. Rotate and shift instructions do not 
change the values of XER[OV] and XER[SO] bits. Rotate and shift instructions, except 
algebraic right shifts, do not change the XER[CA] bit. 


See Appendix F, “Simplified Mnemonics,” for a complete list of simplified mnemonics that 
allows simpler coding of often-used functions such as clearing the leftmost or rightmost 
bits of a register, left justifying or right justifying an arbitrary field, and simple rotates and 
shifts. 


4.2.1.4.1 Integer Rotate Instructions 

Integer rotate instructions rotate the contents of a register. The result of the rotation is either 
inserted into the target register under control of a mask (if a mask bit is 1 the associated bit 
of the rotated data is placed into the target register, and if the mask bit is 0 the associated 
bit in the target register is unchanged), or ANDed with a mask before being placed into the 
target register. 


Rotate left instructions allow right-rotation of the contents of a register to be performed by 
a left-rotation of 64—n, where n is the number of bits by which to rotate right. It also allows 
right-rotation of the contents of the low-order 32 bits of a register to be performed by a left- 
rotation of 32 —n, where n is the number of bits by which to rotate right. 
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The integer rotate instructions are summarized in Table 4-4. 


Table 4-4. Integer Rotate Instructions 


Rotate Left rA,rS,SH,MB,ME | The contents of register rS are rotated left by the number of bits 
Word specified by operand SH. A mask is generated having 1 bits from 
Immediate the bit specified by operand MB through the bit specified by 

then AND with operand ME and 0 bits elsewhere. The rotated data is ANDed with 
Mask the generated mask and the result is placed into register rA. 


rlwinm — Rotate Left Word Immediate then AND with Mask 

rlwinm. Rotate Left Word Immediate then AND with Mask with 
CR Update. The dot suffix enables the update of the 
CR. 


Rotate Left rA,rS,rB,MB,ME |The contents of rS are rotated left by the number of bits specified 
Word then by operand in the low-order five bits of rB. A mask is generated 
AND with having 1 bits from the bit specified by operand MB through the bit 
Mask specified by operand ME and 0 bits elsewhere. The rotated word is 
ANDed with the generated mask and the result is placed into rA. 


rlwnm Rotate Left Word then AND with Mask 
rlwnm. — Rotate Left Word then AND with Mask with CR Update. 
The dot suffix enables the update of the CR. 


Rotate Left rA,rS,SH,MB,ME | The contents of rS are rotated left by the number of bits specified 
Word by operand SH. A mask is generated having 1 bits from the bit 
Immediate specified by operand MB through the bit specified by operand ME 
then Mask and 0 bits elsewhere. The rotated word is inserted into rA under 
Insert control of the generated mask. 


rlwimi Rotate Left Word Immediate then Mask 
rlwimi. | Rotate Left Word Immediate then Mask Insert with CR 
Update. The dot suffix enables the update of the CR. 





4.2.1.4.2 Integer Shift Instructions 

The integer shift instructions perform left and right shifts. Immediate-form logical 
(unsigned) shift operations are obtained by specifying masks and shift values for certain 
rotate instructions. Simplified mnemonics (shown in Appendix F, “Simplified 
Mnemonics’’) are provided to make coding of such shifts simpler and easier to understand. 


Any shift right algebraic instruction, followed by addze, can be used to divide quickly by 
2”. The setting of XER[CA] by the shift right algebraic instruction is independent of mode. 


Multiple-precision shifts can be programmed as shown in Appendix C, “Multiple-Precision 
Shifts.” 
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The integer shift instructions are summarized in Table 4-5. 


Table 4-5. Integer Shift Instructions 


Operand ; 


Shift Left The contents of rS are shifted left the number of bits specified by operand in 

Word the low-order six bits of rB. Bits shifted out of position 0 are lost. Zeros are 
supplied to the vacated positions on the right. The 32-bit result is placed into 
rA. 


slw Shift Left Word 
slw. Shift Left Word with CR Update. The dot suffix enables the update 
of the CR. 


Shift Right The contents of rS are shifted right the number of bits specified by the low- 
Word order six bits of rB. Bits shifted out of position 31 are lost. Zeros are supplied 
to the vacated positions on the left. The 32-bit result is placed into rA. 


srw Shift Right Word 


srw. Shift Right Word with CR Update. The dot suffix enables the 
update of the CR. 


Shift Right rA,rS,SH_ | The contents of rS are shifted right the number of bits specified by operand 
Algebraic SH. Bits shifted out of position 31are lost. The result is sign extended and 
Word placed into rA. 
Inmedtale srawi Shift Right Algebraic Word Immediate 
srawi. Shift Right Algebraic Word Immediate with CR Update. The dot 
suffix enables the update of the CR. 
Shift Right rArS,rB The contents of rS are shifted right the number of bits specified by the low- 
Algebraic order six bits of rB. Bits shifted out of position 31 are lost. The result is 
Word placed into rA. 
sraw Shift Right Algebraic Word 
sraw. Shift Right Algebraic Word with CR Update. The dot suffix 
enables the update of the CR. 


4.2.2 Floating-Point Instructions 


This section describes the floating-point instructions, which include the following: 





¢ Floating-point arithmetic instructions 

¢ Floating-point multiply-add instructions 

¢ Floating-point rounding and conversion instructions 
¢ Floating-point compare instructions 

¢ Floating-point status and control register instructions 
¢ Floating-point move instructions 


Note that MSR[FP] must be set in order for any of these instructions (including the floating- 
point loads and stores) to be executed. If MSR[FP] = 0 when any floating-point instruction 
is attempted, the floating-point unavailable exception is taken (see Section 6.4.8, “Floating- 
Point Unavailable Exception (0x00800)”). See Section 4.2.3, “Load and _ Store 
Instructions,” for information about floating-point loads and stores. 
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The PowerPC architecture supports a floating-point system as defined in the IEEE-754 
standard, but requires software support to conform with that standard. Floating-point 
operations conform to the IEEE-754 standard, with the exception of operations performed 
with the fmadd, fres, fsel, and frsqrte instructions, or if software sets the non-[EEE mode 
bit (NI) in the FPSCR. Refer to Section 3.3, “Floating-Point Execution Models—UISA,” 
for detailed information about the floating-point formats and exception conditions. Also, 
refer to Appendix D, “Floating-Point Models,” for more information on the floating-point 
execution models used by the PowerPC architecture. 


4.2.2.1 Floating-Point Arithmetic Instructions 
The floating-point arithmetic instructions are summarized in Table 4-6. 


Table 4-6. Floating-Point Arithmetic Instructions 


Operand ; 
ee 


Floating fadd frD,frA,frB The floating-point operand in register frA is added to the floating-point 
Add fadd. operand in register frB. If the most significant bit of the resultant significand 
(Double- is not a one the result is normalized. The result is rounded to the target 
Precision) precision under control of the floating-point rounding control field RN of the 
FPSCR and placed into register frD. 
fadd Floating Add (Double-Precision) 
fadd. Floating Add (Double-Precision) with CR Update. The dot suffix 
enables the update of the CR. 
Floating fadds frD,frA,frB The floating-point operand in register frA is added to the floating-point 
Add Single | fadds. operand in register frB. If the most significant bit of the resultant significand 
is not a one, the result is normalized. The result is rounded to the target 
precision under control of the floating-point rounding control field RN of the 
FPSCR and placed into register frD. 
fadds Floating Add Single 
fadds. Floating Add Single with CR Update. The dot suffix enables the 
update of the CR. 


Floating frD,frA,frB The floating-point operand in register frB is subtracted from the floating- 
Subtract point operand in register frA. If the most significant bit of the resultant 
(Double- significand is not 1, the result is normalized. The result is rounded to the 
Precision) target precision under control of the floating-point rounding control field RN 
of the FPSCR and placed into register frD. 
fsub Floating Subtract (Double-Precision) 
fsub. Floating Subtract (Double-Precision) with CR Update. The dot 
suffix enables the update of the CR. 
Floating frD,frA,frB The floating-point operand in register frB is subtracted from the floating- 
Subtract point operand in register frA. If the most significant bit of the resultant 
Single significand is not 1, the result is normalized. The result is rounded to the 
target precision under control of the floating-point rounding control field RN 
of the FPSCR and placed into frD. 
fsubs Floating Subtract Single 
fsubs. Floating Subtract Single with CR Update. The dot suffix enables 
the update of the CR. 
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Floating 
Multiply 
(Double- 
Precision) 


Floating 
Multiply 
Single 


Floating 
Divide 
(Double- 
Precision) 


Floating 
Divide 


Floating 
Square 
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(Double- 
Precision) 


Floating 
Square 
Root 


Floating 
Reciprocal 
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Table 4-6. Floating 


Operand 
Syntax 


aon 
ale 
an 
Py 


-Point Arithmetic Instructions (Continued) 


The floating-point operand in register frA is multiplied by the floating-point 
operand in register frC. 


fmul Floating Multiply (Double-Precision) 
fmul. Floating Multiply (Double-Precision) with CR Update. The dot 
suffix enables the update of the CR. 


The floating-point operand in register frA is multiplied by the floating-point 
operand in register frC. 


fmuls Floating Multiply Single 
fmuls. Floating Multiply Single with CR Update. The dot suffix enables 
the update of the CR. 


The floating-point operand in register frA is divided by the floating-point 
operand in register frB. No remainder is preserved. 


fdiv Floating Divide (Double-Precision) 
fdiv. Floating Divide (Double-Precision) with CR Update. The dot 
suffix enables the update of the CR. 


The floating-point operand in register frA is divided by the floating-point 
operand in register frB. No remainder is preserved. 


fdivs Floating Divide Single 
fdivs. Floating Divide Single with CR Update. The dot suffix enables 
the update of the CR. 


The square root of the floating-point operand in register frB is placed into 
register frD. 


fsqrt Floating Square Root (Double-Precision) 

fsqrt. Floating Square Root (Double-Precision) with CR Update. The 
dot suffix enables the update of the CR. 

This instruction is optional. 


The square root of the floating-point operand in register frB is placed into 
register frD. 


fsqrts Floating Square Root Single 
fsqrts. Floating Square Root Single with CR Update. The dot suffix 
enables the update of the CR. 


This instruction is optional. 


A single-precision estimate of the reciprocal of the floating-point operand in 
register frB is placed into frD. The estimate placed into frD is correct to a 
precision of one part in 256 of the reciprocal of frB. 


fres Floating Reciprocal Estimate Single 

fres. Floating Reciprocal Estimate Single with CR Update. The dot 
suffix enables the update of the CR. 

This instruction is optional. 
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Table 4-6. Floating-Point Arithmetic Instructions (Continued) 


Operand , 
rane Jaironone| Brit | poate 


Floating A double-precision estimate of the reciprocal of the square root of the 
Reciprocal floating-point operand in register frB is placed into frD. The estimate 
Square placed into frD is correct to a precision of one part in 32 of the reciprocal of 
Root the square root of frB. 

Estimate 


frsqrte Floating Reciprocal Square Root Estimate 
frsqrte. Floating Reciprocal Square Root estimate with CR Update. The 
dot suffix enables the update of the CR. 


This instruction is optional. 


Floating frD,frA,frC,frB | The floating-point operand in frA is compared to the value zero. If the 

Select operand is greater than or equal to zero, frD is set to the contents of frC. If 
the operand is less than zero or is a NaN, frD is set to the contents of frB. 
The comparison ignores the sign of zero (that is, regards +0 as equal to 


Floating Select 
Floating Select with CR Update. The dot suffix enables the 
update of the CR. 


This instruction is optional. 





4.2.2.2 Floating-Point Multiply-Add Instructions 

These instructions combine multiply and add operations without an intermediate rounding 
operation. The fractional part of the intermediate product is 106 bits wide, and all 106 bits 
take part in the add/subtract portion of the instruction. 


Status bits are set as follows: 


* Overflow, underflow, and inexact exception bits, the FR and FI bits, and the FPRF 
field are set based on the final result of the operation, and not on the result of the 
multiplication. 


¢ Invalid operation exception bits are set as if the multiplication and the addition were 
performed using two separate instructions (fmuls, followed by fadds or fsubs). That 
is, multiplication of infinity by zero or of anything by an SNaN, and/or addition of 
an SNaN, cause the corresponding exception bits to be set. 


The floating-point multiply-add instructions are summarized in Table 4-7. 


Table 4-7. Floating-Point Multiply-Add Instructions 


Floating frD,frA,frC,frB The floating-point operand in register frA is multiplied by the floating- 
Multiply- 7 point operand in register frC. The floating-point operand in register frB 


is added to this intermediate result. 


Precisi fmadd __ Floating Multiply-Add (Double-Precision) 
fee)on) fmadd. Floating Multiply-Add (Double-Precision) with CR Update. 
The dot suffix enables the update of the CR. 
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Table 4-7. Floating-Point Multiply-Add Instructions (Continued) 


fmadds 
fmadds. 


Floating 
Multiply- 


frD,frA,frC,frB 


Floating 
Multiply- 
Subtract 


frD,frA,frC,frB 
(Double- 
Precision) 


Floating frD,frA,frC,frB 
Multiply- 

Subtract 

Single 


Floating fnmadd frD,frA,frC,frB 
Negative |fnmadd. 

Multiply- 

Add 

(Double- 

Precision) 


Floating 
Negative 
Multiply- 


frD,frA,frC,frB 


Floating 
Negative 
Multiply- 
Subtract 
(Double- 
Precision) 


frD,frA,frC,frB 


Floating frD,frA,frC,frB 
Negative 
Multiply- 


Subtract 


The floating-point operand in register frA is multiplied by the floating- 
point operand in register frC. The floating-point operand in register frB 
is added to this intermediate result. 


fmadds Floating Multiply-Add Single 
fmadds. Floating Multiply-Add Single with CR Update. The dot suffix 
enables the update of the CR. 


The floating-point operand in register frA is multiplied by the floating- 
point operand in register frC. The floating-point operand in register frB 
is subtracted from this intermediate result. 


fmsub 
fmsub. 


Floating Multiply-Subtract (Double-Precision) 
Floating Multiply-Subtract (Double-Precision) with CR 
Update. The dot suffix enables the update of the CR. 


The floating-point operand in register frA is multiplied by the floating- 
point operand in register frC. The floating-point operand in register frB 
is subtracted from this intermediate result. 


fmsubs — Floating Multiply-Subtract Single 
fmsubs. Floating Multiply-Subtract Single with CR Update. The dot 
suffix enables the update of the CR. 


The floating-point operand in register frA is multiplied by the floating- 
point operand in register frC. The floating-point operand in register frB 
is added to this intermediate result. 


fnmadd__ Floating Negative Multiply-Add (Double-Precision) 
fnmadd. Floating Negative Multiply-Add (Double-Precision) with CR 
Update. The dot suffix enables update of the CR. 


The floating-point operand in register frA is multiplied by the floating- 
point operand in register frC. The floating-point operand in register frB 
is added to this intermediate result. 


fnmadds Floating Negative Multiply-Add Single 
fnmadds. Floating Negative Multiply-Add Single with CR Update. The 
dot suffix enables the update of the CR. 


The floating-point operand in register frA is multiplied by the floating- 
point operand in register frC. The floating-point operand in register frB 
is subtracted from this intermediate result. 


fnmsub — Floating Negative Multiply-Subtract (Double-Precision) 
fnmsub. Floating Negative Multiply-Subtract (Double-Precision) with 
CR Update. The dot suffix enables the update of the CR. 


The floating-point operand in register frA is multiplied by the floating- 
point operand in register frC. The floating-point operand in register frB 
is subtracted from this intermediate result. 


fnmsubs Floating Negative Multiply-Subtract Single 
fnmsubs. Floating Negative Multiply-Subtract Single with CR Update. 
The dot suffix enables the update of the CR. 





For more information on multiply-add instructions, refer to Section D.2, “Execution Model 
for Multiply-Add Type Instructions.” 
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4.2.2.3 Floating-Point Rounding and Conversion Instructions 


The Floating Round to Single-Precision (frsp) instruction is used to truncate a 64-bit 
double-precision number to a 32-bit single-precision floating-point number. The floating- 
point convert instructions convert a 64-bit double-precision floating-point number to a 32- 
bit signed integer number. 


The PowerPC architecture defines bits 0-31 of floating-point register frD as undefined 
when executing the Floating Convert to Integer Word (fctiw) and Floating Convert to 
Integer Word with Round toward Zero (fetiwz) instructions. The floating-point rounding 
instructions are shown in Table 4-8. 


Examples of uses of these instructions to perform various conversions can be found in 
Appendix D, “Floating-Point Models.” 


Table 4-8. Floating-Point Rounding and Conversion Instructions 


Operand ; 
rome [inonane| Br) maton | 


Floating Round The floating-point operand in frB is rounded to single-precision using the 
to Single- rounding mode specified by FPSCR[RN] and placed into frD. 


Precision frsp Floating Round to Single-Precision 


frsp. Floating Round to Single-Precision with CR Update. The dot 
suffix enables the update of the CR. 


Floating Convert The floating-point operand in register frB is converted to a 32-bit signed 
to Integer Word integer, using the rounding mode specified by FPSCR[RN], and placed in 
the low-order 32 bits of frD. Bits 0-31 of frD are undefined. 


fctiw Floating Convert to Integer Word 
fctiw. Floating Convert to Integer Word with CR Update. The dot suffix 
enables the update of the CR. 


Floating Convert The floating-point operand in register frB is converted to a 32-bit signed 

to Integer Word integer, using the rounding mode Round toward Zero, and placed in the low- 
with Round order 32 bits of frD. Bits 0-31 of frD are undefined. 

toward Zero 


fetiwz Floating Convert to Integer Word with Round toward Zero 
fetiwz. — Floating Convert to Integer Word with Round toward Zero with 
CR Update. The dot suffix enables the update of the CR. 





4.2.2.4 Floating-Point Compare Instructions 

Floating-point compare instructions compare the contents of two floating-point registers 
and the comparison ignores the sign of zero (that is +0 = —0). The comparison can be 
ordered or unordered. The comparison sets one bit in the designated CR field and clears the 
other three bits. The FPCC (floating-point condition code) in bits 16-19 of the FPSCR 
(floating-point status and control register) is set in the same way. 
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The CR field and the FPCC are interpreted as shown in Table 4-9. 


Table 4-9. CR Bit Settings 


[eT [en 


(frA) 2 (frB) (unordered) 


The floating-point compare instructions are summarized in Table 4-10. 





Table 4-10. Floating-Point Compare Instructions 
Operand . 

jane |atmenone | etme | maton 
Floating crfD,frA,frB | The floating-point operand in frA is compared to the floating-point operand 
Compare in frB. The result of the compare is placed into crfD and the FPCC. 
Unordered 

crfD,frA,frB | The floating-point operand in frA is compared to the floating-point operand 

in frB. The result of the compare is placed into erfD and the FPCC. 


4.2.2.5 Floating-Point Status and Control Register Instructions 


Every FPSCR instruction appears to synchronize the effects of all floating-point 
instructions executed by a given processor. Executing an FPSCR instruction ensures that all 
floating-point instructions previously initiated by the given processor appear to have 
completed before the FPSCR instruction is initiated and that no subsequent floating-point 
instructions appear to be initiated by the given processor until the FPSCR instruction has 
completed. In particular: 





e All exceptions caused by the previously initiated instructions are recorded in the 
FPSCR before the FPSCR instruction is initiated. 


¢ All invocations of the floating-point exception handler caused by the previously 
initiated instructions have occurred before the FPSCR instruction is initiated. 


¢ No subsequent floating-point instruction that depends on or alters the settings of any 
FPSCR bits appears to be initiated until the FPSCR instruction has completed. 


Floating-point memory access instructions are not affected by the execution of the FPSCR 
instructions. 
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The FPSCR instructions are summarized in Table 4-11. 


Table 4-11. Floating-Point Status and Control Register Instructions 


Operand , 


Move from The contents of the FPSCR are placed into bits 32-63 of frD. Bits 0-31 of 
frD are undefined. 


mffs Move from FPSCR 
mffs. Move from FPSCR with CR Update. The dot suffix enables the 
update of the CR. 


crfD,crfS | The contents of FPSCR field specified by operand erfS are copied to the 
Condition CR field specified by operand erfD. All exception bits copied (except FEX 
Register from and VX bits) are cleared in the FPSCR. 
FPSCR 
Move to The contents of the IMM field are placed into FPSCR field erfD. The 
FPSCR Field contents of FPSCR[FX] are altered only if crfD = 0. 
immediate mtfsfi Move to FPSCR Field Immediate 

mtfsfi. | Move to FPSCR Field Immediate with CR Update. The dot 
suffix enables the update of the CR. 


Move to FM,frB Bits 32-63 of frB are placed into the FRPSCR under control of the field 
FPSCR Fields mask specified by FM. The field mask identifies the 4-bit fields affected. 
Let /be an integer in the range 0-7. If FM[/] = 1, FRSCR field i (FPSCR 
bits 4**/ through 4*/+3) is set to the contents of the corresponding field of 
the low-order 32 bits of frB. 
The contents of FPSCR[FX] are altered only if FM[O] = 1. 
mtfsf Move to FPSCR Fields 
mtfsf. Move to FPSCR Fields with CR Update. The dot suffix enables 
the update of the CR. 
Move to mtfsb0 The FPSCR bit location specified by operand erbD is cleared. 
ESS AVBIED  rintishd: Bits 1 and 2 (FEX and VX) cannot be reset explicitly. 
mtfsb0 Move to FPSCR Bit 0 
mtfsb0. Move to FPSCR Bit 0 with CR Update. The dot suffix enables 
the update of the CR. 
Move to The FPSCR bit location specified by operand erbD is set. 
Ghee Bits 1 and 2 (FEX and VX) cannot be set explicitly. 
mtfsb1 Move to FPSCR Bit 1 
mtfsb1. Move to FPSCR Bit 1 with CR Update. The dot suffix enables 
the update of the CR. 
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4.2.2.6 Floating-Point Move Instructions 


Floating-point move instructions copy data from one FPR to another, altering the sign bit 
(bit 0) as described for the fneg, fabs, and fnabs instructions in Table 4-12. The fneg, fabs, 
and fnabs instructions may alter the sign bit of a NaN. The floating-point move instructions 
do not modify the FPSCR. The CR update option in these instructions controls the placing 
of result status into CR1. If the CR update option is enabled, CR1 is set; otherwise, CR1 is 
unchanged. 


Table 4-12 provides a summary of the floating-point move instructions. 


Table 4-12. Floating-Point Move Instructions 


Floating The contents of frB are placed into frD. 
Move 


Register fmr Floating Move Register 


fmr. Floating Move Register with CR Update. The dot suffix 
enables the update of the CR. 


Floating The contents of frB with bit 0 inverted are placed into frD. 


Negate fneg Floating Negate 


fneg. Floating Negate with CR Update. The dot suffix enables the 
update of the CR. 


Floating The contents of frB with bit 0 cleared are placed into frD. 
Absolute . fabs 


Floating Absolute Value 
Value fabs 


Floating Absolute Value with CR Update. The dot suffix 
enables the update of the CR. 


Floating The contents of frB with bit 0 set are placed into frD. 
Negative 
Absolute 
Value 


fnabs Floating Negative Absolute Value 
fnabs. —_ Floating Negative Absolute Value with CR Update. The dot 
suffix enables the update of the CR. 





4.2.3 Load and Store Instructions 


Load and store instructions are issued and translated in program order; however, the 
accesses can occur out of order. Synchronizing instructions are provided to enforce strict 
ordering. This section describes the load and store instructions, which consist of the 
following: 

¢ Integer load instructions 

¢ Integer store instructions 

¢ Integer load and store with byte-reverse instructions 

¢ Integer load and store multiple instructions 

¢ Floating-point load instructions 

¢ Floating-point store instructions 

¢ Memory synchronization instructions 
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4.2.3.1 Integer Load and Store Address Generation 

Integer load and store operations generate effective addresses using register indirect with 
immediate index mode, register indirect with index mode, or register indirect mode. See 
Section 4.1.4.2, “Effective Address Calculation,’ for information about calculating 
effective addresses. Note that in some implementations, operations that are not naturally 
aligned may suffer performance degradation. Refer to Section 6.4.6.1, “Integer Alignment 
Exceptions,” for additional information about load and store address alignment exceptions. 


4.2.3.1.1 Register Indirect with Immediate Index Addressing for Integer 
Loads and Stores 

Instructions using this addressing mode contain a signed 16-bit immediate index 
(d operand) which is sign extended, and added to the contents of a general-purpose register 
specified in the instruction (rA operand) to generate the effective address. If the rA field of 
the instruction specifies r0, a value of zero is added to the immediate index (d operand) in 
place of the contents of r0. The option to specify rA or 0 is shown in the instruction 
descriptions as (rAlQ). 


Figure 4-1 shows how an effective address is generated when using register indirect with 
immediate index addressing. 


0 56 1011 15 16 31 
Instruction Encoding: | opcode | rb/rs | rA d 
























Sign Extension 











0 63 


Effective Address 


Yy 
0 63 
Store Memory 
GPR (rD/rS) Load Interface 


Figure 4-1. Register Indirect with Immediate Index Addressing for Integer 
Loads/Stores 
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4.2.3.1.2 Register Indirect with Index Addressing for Integer Loads and 
Stores 

Instructions using this addressing mode cause the contents of two general-purpose registers 

(specified as operands rA and rB) to be added in the generation of the effective address. A 

zero in place of the rA operand causes a zero to be added to the contents of the general- 

purpose register specified in operand rB (or the value zero for Iswi and stswi instructions). 

The option to specify rA or 0 is shown in the instruction descriptions as (rAl0). 


Figure 4-2 shows how an effective address is generated when using register indirect with 
index addressing. 
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Figure 4-2. Register Indirect with Index Addressing for Integer Loads/Stores 


4.2.3.1.3 Register Indirect Addressing for Integer Loads and Stores 
Instructions using this addressing mode use the contents of the general-purpose register 
specified by the rA operand as the effective address. A zero in the rA operand causes an 
effective address of zero to be generated. The option to specify rA or 0 is shown in the 
instruction descriptions as (rAl0). 


Figure 4-3 shows how an effective address is generated when using register indirect 
addressing. 
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Figure 4-3. Register Indirect Addressing for Integer Loads/Stores 


4.2.3.2 Integer Load Instructions 

For integer load instructions, the byte, half word, word, or double word addressed by the 
EA (effective address) is loaded into rD. Many integer load instructions have an update 
form, in which rA is updated with the generated effective address. For these forms, if rA 4 
0 and rA #rD (otherwise invalid), the EA is placed into rA and the memory element (byte, 
half word, word, or double word) addressed by the EA is loaded into rD. Note that the 
PowerPC architecture defines load with update instructions with operand rA = 0 or 
rA =rD as invalid forms. 


The default byte and bit ordering is big-endian in the PowerPC architecture; see 
Section 3.1.2, “Byte Ordering,” for information about little-endian byte ordering. 


Note that in some implementations of the architecture, the load word algebraic instructions 
(Iha, Ihax, wa, lwax) and the load with update (Ibzu, Ibzux, Ihzu, Ihzux, Ihau, Ihaux, 
Iwaux, Idu, Idux) instructions may execute with greater latency than other types of load 
instructions. Moreover, the load with update instructions may take longer to execute in 
some implementations than the corresponding pair of a nonupdate load followed by an add 
instruction. 
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Table 4-13 summarizes the integer load instructions. 


Load Byte and 
Zero 


Load Byte and 
Zero Indexed 


Load Byte and 
Zero with 
Update 


Load Byte and 
Zero with 
Update Indexed 


Load Half Word 
and Zero 


Load Half Word 
and Zero 
Indexed 


Load Half Word 
and Zero with 
Update 


Load Half Word 
and Zero with 
Update Indexed 


Load Half Word 
Algebraic 


Load Half Word 
Algebraic 
Indexed 


Load Half Word 
Algebraic with 
Update 


Load Half Word 
Algebraic with 
Update Indexed 


Table 4-13. Integer Load Instructions 


The EA is the sum (rA|0) + d. The byte in memory addressed by the EA is 
loaded into the low-order eight bits of rD. The remaining bits in rD are 
cleared. 


Operand 
Syntax 


a 
nll as 
ini ee 
wal tee 
a at 
eal ae 
el sual 
nie eae 
rel cas 
a si 


| 7 
wie 


The EA is the sum (rA|0) + 
loaded into the 
cleared. 


(rB). The byte in memory addressed by the EA is 


low-order eight bits of rD. The remaining bits in rD are 


The EA is the sum (rA) + d. The byte in memory addressed by the EA is 
loaded into the low-order eight bits of rD. The remaining bits in rD are 
cleared. The EA is placed into rA. 


The EA is the sum (rA) + (rB). The byte in memory addressed by the EA is 
loaded into the low-order eight bits of rD. The remaining bits in rD are 
cleared. The EA is placed into rA. 





The EA is the sum (rA|0) + d. The half word in memory addressed by the EA 
is loaded into the low-order 16 bits of rD. The remaining bits in rD are 
cleared. 


The EA is the sum (rA|O) + (rB). The half word in memory addressed by the 
EA is loaded into the low-order 16 bits of rD. The remaining bits in rD are 
cleared. 


The EA is the sum (rA) + d. The half word in memory addressed by the EA is 
loaded into the low-order 16 bits of rD. The remaining bits in rD are cleared. 
The EA is placed into rA. 


The EA is the sum (rA) + (rB). The half word in memory addressed by the EA 
is loaded into the low-order 16 bits of rD. The remaining bits in rD are 
cleared. The EA is placed into rA. 


The EA is the sum (rA|0) + d. The half word in memory addressed by the EA 
is loaded into the low-order 16 bits of rD. The remaining bits in rD are filled 
with a copy of the most significant bit of the loaded half word. 


The EA is the sum (rAJ0) + (rB). The half word in memory addressed by the 
EA is loaded into the low-order 16 bits of rD. The remaining bits in rD are 
filled with a copy of the most significant bit of the loaded half word. 


The EA is the sum (rA) + d. The half word in memory addressed by the EA is 
loaded into the low-order 16 bits of rD. The remaining bits in rD are filled with 
a copy of the most significant bit of the loaded half word. The EA is placed 
into rA. 


The EA is the sum (rA) + (rB). The half word in memory addressed by the EA 
is loaded into the low-order 16 bits of rD. The remaining bits in rD are filled 
with a copy of the most significant bit of the loaded half word. The EA is 
placed into rA. 


The EA is the sum (rA|0) + d. The word in memory addressed by the EA is 
loaded into rD. 


Load Word and rD,d(rA) 
Zero 

Load Word and rD,rA,rB_ | The EA is the sum (rAJ0) + (rB). The word in memory addressed by the EA is 
Zero Indexed loaded into rD. 
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Table 4-13. Integer Load Instructions (Continued) 
Operand 
none fumenone| Brame] maton 
Load Word and rD,d(rA) | The EA is the sum (rA) + d. The word in memory addressed by the EA is 
Zero with loaded into rD. The EA is placed into rA. 
Update 


Update Indexed 





Load Word and rD,rA,rB_ | The EA is the sum (rA) + (rB). The word in memory addressed by the EA is 
Zero with loaded into rD. The EA is placed into rA. 


4.2.3.3 Integer Store Instructions 

For integer store instructions, the contents of rS are stored into the byte, half word, word or 
double word in memory addressed by the EA (effective address). Many store instructions 
have an update form, in which rA is updated with the EA. For these forms, the following 
rules apply: 


¢ IfrA +0, the effective address is placed into rA. 


¢ IfrS=rA, the contents of register rS are copied to the target memory element, then 
the generated EA is placed into rA (rS). 


In general, the PowerPC architecture defines a sequential execution model. However, when 
a store instruction modifies a memory location that contains an instruction, software 
synchronization is required to ensure that subsequent instruction fetches from that location 
obtain the modified version of the instruction. 


If a program modifies the instructions it intends to execute, it should call the appropriate 
system library program before attempting to execute the modified instructions to ensure 
that the modifications have taken effect with respect to instruction fetching. 


The PowerPC architecture defines store with update instructions with rA = 0 as an invalid 
form. In addition, it defines integer store instructions with the CR update option enabled 
(Re field, bit 31, in the instruction encoding = 1) to be an invalid form. Table 4-14 provides 
a summary of the integer store instructions. 


Table 4-14. Integer Store Instructions 
Operand F 
fname famemon ame] mtn | 
Store Byte rS,d(rA) |The EA is the sum (rA|0) + d. The contents of the low-order eight bits 
of rS are stored into the byte in memory addressed by the EA. 


Store Byte Indexed rS,rA,rB_ | The EA is the sum (rA|0) + (rB). The contents of the low-order eight 
bits of rS are stored into the byte in memory addressed by the EA. 


Store Byte with rS,d(rA) |The EA is the sum (rA) + d. The contents of the low-order eight bits of 
Update rS are stored into the byte in memory addressed by the EA. The EA is 
placed into rA. 
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Table 4-14. Integer Store Instructions (Continued) 


Operand 


Store Byte with rS,rA,rB_ | The EA is the sum (rA) + (rB). The contents of the low-order eight bits 
Update Indexed of rS are stored into the byte in memory addressed by the EA. The EA 


is placed into rA. 
Store Half Word rS,d(rA) |The EA is the sum (rA|0) + d. The contents of the low-order 16 bits of 
rS are stored into the half word in memory addressed by the EA. 
Store Half Word rS,rA,rB_ | The EA is the sum (rA|0) + (rB). The contents of the low-order 16 bits 
Indexed of rS are stored into the half word in memory addressed by the EA. 
Store Half Word with rS,d(rA) |The EA is the sum (rA) + d. The contents of the low-order 16 bits of rS 
Update are stored into the half word in memory addressed by the EA. The EA 
is placed into rA. 


EA is placed into rA. 


Store Word rS,d(rA) |The EA is the sum (rA|O) + d. The contents of rS are stored into the 
word in memory addressed by the EA. 

Store Word Indexed rS,rA,rB_ | The EA is the sum (rA|0) + (rB). The contents of rS are stored into the 
word in memory Bed by the EA. 

Store Word with rS,d(rA) |The EA is the sum (rA) + d. The contents of rS are stored into the 

Update word in memory addressed by the EA. The EA is placed into rA. 

Store Word with rS,rA,rB_ | The EA is the sum (rA) + (rB). The contents of rS are stored into the 

Update Indexed word in memory addressed by the EA. The EA is placed into rA. 


Store Half Word with rS,rA,rB_ | The EA is the sum (rA) + (rB). The contents of the low-order 16 bits of 
Update Indexed rS are stored into the half word in memory addressed by the EA. The 





4.2.3.4 Integer Load and Store with Byte-Reverse Instructions 

Table 4-15 describes integer load and store with byte-reverse instructions. Note that in 
some PowerPC implementations, load byte-reverse instructions may have greater latency 
than other load instructions. 


When used in a PowerPC system operating with the default big-endian byte order, these 
instructions have the effect of loading and storing data in little-endian order. Likewise, 
when used in a PowerPC system operating with little-endian byte order, these instructions 
have the effect of loading and storing data in big-endian order. For more information about 
big-endian and little-endian byte ordering, see Section 3.1.2, “Byte Ordering.” 
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Table 4-15. Integer Load and Store with Byte-Reverse Instructions 


Operand ‘ 
fname fimemon] Bem] mento 


Load Half rD,rA,rB_ | The EA is the sum (rAJ0) + (rB). The high-order eight bits of the half word 

Word Byte- addressed by the EA are loaded into the low-order eight bits of rD. The next eight 
Reverse higher-order bits of the half word in memory addressed by the EA are loaded into 
Indexed the next eight lower-order bits of rD. The remaining rD bits are cleared. 


rD,rA,rB_ | The EA is the sum (rA|0) + (rB). Bits 0-7 of the word in memory addressed by 
the EA are loaded into the low-order eight bits of rD. Bits 8-15 of the word in 
memory addressed by the EA are loaded into bits 16-23 of rD. Bits 16-23 of the 
word in memory addressed by the EA are loaded into bits 8-15. Bits 24-31 of 
the word in memory addressed by the EA are loaded into bits 0-7. The 
remaining bits in rD are cleared. 


Store Half rS,rA,rB_ | The EA is the sum (rA|0) + (rB). The contents of the low-order eight bits of rS are 
Word Byte- stored into the high-order eight bits of the half word in memory addressed by the 
Reverse EA. The contents of the next lower-order eight bits of rS are stored into the next 
Indexed eight higher-order bits of the half word in memory addressed by the EA. 


Store Word rS,rA,rB_ | The effective address is the sum (rA|0) + (rB). The contents of the low-order 
Byte- eight bits of rS are stored into bits 0-7 of the word in memory addressed by EA. 
Reverse The contents of the next eight lower-order bits of rS are stored into bits 8-15 of 
Indexed the word in memory addressed by the EA. The contents of the next eight lower- 
order bits of rS are stored into bits 16-23 of the word in memory addressed by 
the EA. The contents of the next eight lower-order bits of rS are stored into bits 


24-31 of the word addressed by the EA. 





4.2.3.5 Integer Load and Store Multiple Instructions 

The load/store multiple instructions are used to move blocks of data to and from the GPRs. 
The load multiple and store multiple instructions may have operands that require memory 
accesses crossing a 4-Kbyte page boundary. As a result, these instructions may be 
interrupted by a DSI exception associated with the address translation of the second page. 
Table 4-16 summarizes the integer load and store multiple instructions. 


In the load/store multiple instructions, the combination of the EA and rD (rS) is such that 
the low-order byte of GPR31 is loaded from or stored into the last byte of an aligned quad 
word in memory; if the effective address is not correctly aligned, it may take significantly 
longer to execute. 


In some PowerPC implementations operating with little-endian byte order, execution of an 
Imw or stmw instruction causes the system alignment error handler to be invoked; see 
Section 3.1.2, “Byte Ordering,” for more information. 
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The PowerPC architecture defines the load multiple word (Imw) instruction with rA in the 
range of registers to be loaded, including the case in which rA = 0, as an invalid form. 


Table 4-16. Integer Load and Store Multiple Instructions 


Operand 
tome | 





Load Multiple Word Fimw rD,d(rA) The EA is the sum (rAI0) +d. n = (32 —rD). 


Store Multiple Word rS,d(rA) The EA is the sum (rA|O) + d. n = (32 -rS). 


4.2.3.6 Integer Load and Store String Instructions 


The integer load and store string instructions allow movement of data from memory to 
registers or from registers to memory without concern for alignment. These instructions can 
be used for a short move between arbitrary memory locations or to initiate a long move 
between misaligned memory fields. However, in some implementations, these instructions 
are likely to have greater latency and take longer to execute, perhaps much longer, than a 
sequence of individual load or store instructions that produce the same results. Table 4-17 
summarizes the integer load and store string instructions. 


Load and store string instructions execute more efficiently when rD or rS = 5, and the last 
register loaded or stored is less than or equal to 12. 


In some PowerPC implementations operating with little-endian byte order, execution of a 
load or string instruction causes the system alignment error handler to be invoked; see 
Section 3.1.2, “Byte Ordering,” for more information. 


Table 4-17. Integer Load and Store String Instructions 


rD,rA,NB The EA is (rAj0). 


Load String Word Immediate 


Store String Word Immediate 


rS,rA,NB The EA is (rA|0). 
The EA is the sum (rA\0) + (rB). 


Load string and store string instructions may involve operands that are not word-aligned. 
As described in Section 6.4.6, “Alignment Exception (0x00600),” a misaligned string 
operation suffers a performance penalty compared to an aligned operation of the same type. 
A non—word-aligned string operation that crosses a double-word boundary is also slower 
than a word-aligned string operation. 





Load String Word Indexed Fiswx =| rDrArB The EA is the sum (rA\0) + (rB). 


Store String Word Indexed 
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4.2.3.7 Floating-Point Load and Store Address Generation 
Floating-point load and store operations generate effective addresses using the register 
indirect with immediate index addressing mode and register indirect with index addressing 
mode. Floating-point loads and stores are not supported for direct-store interface accesses. 
The use of floating-point loads and stores for direct-store interface accesses results in an 
alignment exception. Note that the direct-store facility is being phased out of the 
architecture and is not likely to be supported in future devices. 


4.2.3.7.1 Register Indirect with Immediate Index Addressing for Floating- 
Point Loads and Stores 

Instructions using this addressing mode contain a signed 16-bit immediate index 
(d operand) which is sign extended to 32 bits, and added to the contents of a GPR specified 
in the instruction (rA operand) to generate the effective address. If the rA field of the 
instruction specifies r0, a value of zero is added to the immediate index (d operand) in place 
of the contents of r0. The option to specify rA or 0 is shown in the instruction descriptions 
as (rAl0). 


Figure 4-4 shows how an effective address is generated when using register indirect with 
immediate index addressing for floating-point loads and stores. 
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Figure 4-4. Register Indirect with Immediate Index Addressing for Floating-Point 
Loads/Stores 
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4.2.3.7.2 Register Indirect with Index Addressing for Floating-Point Loads 
and Stores 

Instructions using this addressing mode add the contents of two GPRs (specified in 

operands rA and rB) to generate the effective address. A zero in the rA operand causes a 

zero to be added to the contents of the GPR specified in operand rB. This is shown in the 

instruction descriptions as (rAl0). 


Figure 4-5 shows how an effective address is generated when using register indirect with 
index addressing. 
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Figure 4-5. Register Indirect with Index Addressing for Floating-Point Loads/Stores 


The PowerPC architecture defines floating-point load and store with update instructions 
(Ifsu, Ifsux, Ifdu, Ifdux, stfsu, stfsux, stfdu, stfdux) with operand rA = 0 as invalid forms 
of the instructions. In addition, it defines floating-point load and store instructions with the 
CR updating option enabled (Rc bit, bit 31 = 1) to be an invalid form. 


The PowerPC architecture defines that the FPSCR[UE] bit should not be used to determine 
whether denormalization should be performed on floating-point stores. 


4.2.3.8 Floating-Point Load Instructions 

There are two forms of the floating-point load instruction—single-precision and double- 
precision operand formats. Because the FPRs support only the floating-point double- 
precision format, single-precision floating-point load instructions convert single-precision 
data to double-precision format before loading the operands into the target FPR. This 
conversion is described fully in Section D.6, “Floating-Point Load Instructions.” 
Table 4-18 provides a summary of the floating-point load instructions. 
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Note that the PowerPC architecture defines load with update instructions with rA = 0 as an 
invalid form. 


Table 4-18. Floating-Point Load Instructions 


Operand . 
j mone |umenone| rat] maton 

Load Floating- frD,d(rA) | The EA is the sum (rA|O) + d. 

Polptsingle The word in memory addressed by the EA is interpreted as a floating-point 
single-precision operand. This word is converted to floating-point double- 
precision format and placed into frD. 

Load Floating- frD,rA,rB_ | The EA is the sum (rA|O) + (rB). 

Point Single Th di he EAis i floati : 

indexed he word in memory addressed by tl e is interpreted asa floating-point 
single-precision operand. This word is converted to floating-point double- 
precision format and placed into frD. 


Load Floating- frD,d(rA) |The EA is the sum (rA) + d. 
ee The word in memory addressed by the EA is interpreted as a floating-point 
peal single-precision operand. This word is converted to floating-point double- 
precision format and placed into frD. 
The EA is placed into the register specified by rA. 


Load Floating- frD,rA,jrB_ | The EA is the sum (fA) + (rB). 
Pole Sigle The word in memory addressed by the EA is interpreted as a floating-point 
with Update F ee : ; ; F 
single-precision operand. This word is converted to floating-point double- 
Indexed he : 
precision format and placed into frD. 
The EA is placed into the register specified by rA. 


Load Floating- frD,d(rA) | The EA is the sum (rAJO) + d. 
Point Double , : ‘ f 
The double word in memory addressed by the EA is placed into register frD. 


Load Floating- frD,rA,rB_ | The EA is the sum (rAJO) + (rB). 
Point Double 
Indexed 


The double word in memory addressed by the EA is placed into register frD. 


frD,d(rA) |The EA is the sum (rA) + d. 
The double word in memory addressed by the EA is placed into register frD. 
The EA is placed into the register specified by rA. 


Load Floating- ia frD,rAyrB_ | The EA is the sum (A) + (rB). 


Load Floating- 
Point Double 
with Update 


Point Double 
with Update 
Indexed 


The double word in memory addressed by the EA is placed into register frD. 


The EA is placed into the register specified by rA. 
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4.2.3.9 Floating-Point Store Instructions 

This section describes floating-point store instructions. There are three basic forms of the 
store instruction—single-precision, double-precision, and integer. The integer form is 
supported by the stfiwx instruction. (Note that the stfiwx instruction is defined as optional 
by the PowerPC architecture to ensure backwards compatibility with earlier processors; 
however, it will likely be required for subsequent PowerPC processors.) Because the FPRs 
support only floating-point, double-precision format for floating-point data, single- 
precision floating-point store instructions convert double-precision data to single-precision 
format before storing the operands. The conversion steps are described fully in Section D.7, 
“Floating-Point Store Instructions.” Table 4-19 provides a summary of the floating-point 
store instructions. 


Note that the PowerPC architecture defines store with update instructions with rA = 0 as an 
invalid form. 


Table 4-19 provides the floating-point store instructions for the PowerPC processors. 


Table 4-19. Floating-Point Store Instructions 


Store Floating- frS,d(rA) The EA is the sum (rAJO) + d. 
Point Single The contents of frS are converted to single-precision and stored 
into the word in memory addressed by the EA. 
frS,rA,rB The EA is the sum (rAJO) + (rB). 
The contents of frS are converted to single-precision and stored 
into the word in memory addressed by the EA. 
frS,d(rA) The EA is the sum (rA) + d. 
The contents of frS are converted to single-precision and stored 
into the word in memory addressed by the EA. 
The EA is placed into rA. 


frS,rA,rB The EA is the sum (rA) + (rB). 
The contents of frS are converted to single-precision and stored 
into the word in memory addressed by the EA. 
The EA is placed into the rA. 
r 


Store Floating- 
Point Single 
Indexed 





Store Floating- 
Point Single 
with Update 


Store Floating- 
Point Double 


frS,d(rA) The EA is the sum (rAJO) + d. 
The contents of frS are stored into the double word in memory 
addressed by the EA. 

frS,rA,rB The EA is the sum (rAJO) + (rB). 
The contents of frS are stored into the double word in memory 
addressed by the EA. 


frS,d(rA) The EA is the sum (rA) + d. 
The contents of frS are stored into the double word in memory 
addressed by the EA. 
The EA is placed into rA. 
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Store Floating- 
Point Double 
Indexed 





Store Floating- 
Point Double 
with Update 





Store Floating- 
Point Single 
with Update 
Indexed 


Table 4-19. Floating-Point Store Instructions (Continued) 


Store Floating- frS,rA,rB The EA is the sum (rA) + 
tice. The contents of frS are stored into the double word in memory 
bi poate addressed by EA. 
Indexed 
The EA is placed into register rA. 





Store Floating- frS,rA,rB The EA is the sum (rAJO) + (rB). 

Eas: d The contents of the low-order 32 bits of frS are stored, without 

aed conversion, into the word in memory addressed by the EA. 
Note: The stfiwx instruction is defined as optional by the PowerPC 
architecture to ensure backwards compatibility with earlier 
processors; however, it will likely be required for subsequent 
PowerPC processors. 


4.2.4 Branch and Flow Control Instructions 


Some branch instructions can redirect instruction execution conditionally based on the 
value of bits in the CR. When the processor encounters one of these instructions, it scans 
the execution pipelines to determine whether an instruction in progress may affect the 
particular CR bit. If no interlock is found, the branch can be resolved immediately by 
checking the bit in the CR and taking the action defined for the branch instruction. 


If an interlock is detected, the branch is considered unresolved and the direction of the 
branch may either be predicted using the y bit (as described in Table 4-20) or by using 
dynamic prediction. The interlock is monitored while instructions are fetched for the 
predicted branch. When the interlock is cleared, the processor determines whether the 
prediction was correct based on the value of the CR bit. If the prediction is correct, the 
branch is considered completed and instruction fetching continues. If the prediction is 
incorrect, the fetched instructions are purged, and instruction fetching continues along the 
alternate path. 


4.2.4.1 Branch Instruction Address Calculation 


Branch instructions can alter the sequence of instruction execution. Instruction addresses 
are always assumed to be word aligned; the PowerPC processors ignore the two low-order 
bits of the generated branch target address. 


Branch instructions compute the effective address (EA) of the next instruction address 
using the following addressing modes: 

¢ Branch relative 

¢ Branch conditional to relative address 

¢ Branch to absolute address 

¢ Branch conditional to absolute address 

¢ Branch conditional to link register 

¢ Branch conditional to count register 
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In the 32-bit mode of a 64-bit implementation, the final step in the address computation is 
clearing the high-order 32 bits of the target address. 


4.2.4.1.1 Branch Relative Addressing Mode 

Instructions that use branch relative addressing generate the next instruction address by 
sign extending and appending 0b00 to the immediate displacement operand LI, and adding 
the resultant value to the current instruction address. Branches using this addressing mode 
have the absolute addressing option disabled (AA field, bit 30, in the instruction 
encoding = 0). The link register (LR) update option can be enabled (LK field, bit 31, in the 
instruction encoding = 1). This option causes the effective address of the instruction 
following the branch instruction to be placed in the LR. 


Figure 4-6 shows how the branch target address is generated when using the branch relative 
addressing mode. 
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Figure 4-6. Branch Relative Addressing 








4.2.4.1.2 Branch Conditional to Relative Addressing Mode 

If the branch conditions are met, instructions that use the branch conditional to relative 
addressing mode generate the next instruction address by sign extending and appending 
Ob00 to the immediate displacement operand (BD) and adding the resultant value to the 
current instruction address. Branches using this addressing mode have the absolute 
addressing option disabled (AA field, bit 30, in the instruction encoding = 0). The link 
register update option can be enabled (LK field, bit 31, in the instruction encoding = 1). 
This option causes the effective address of the instruction following the branch instruction 
to be placed in the LR. 


Figure 4-7 shows how the branch target address is generated when using the branch 
conditional relative addressing mode. 
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Figure 4-7. Branch Conditional Relative Addressing 


4.2.4.1.3 Branch to Absolute Addressing Mode 


Instructions that use branch to absolute addressing mode generate the next instruction 
address by sign extending and appending Ob00 to the LI operand. Branches using this 
addressing mode have the absolute addressing option enabled (AA field, bit 30, in the 
instruction encoding = 1). The link register update option can be enabled (LK field, bit 31, 
in the instruction encoding = 1). This option causes the effective address of the instruction 
following the branch instruction to be placed in the LR. 


Figure 4-8 shows how the branch target address is generated when using the branch to 
absolute addressing mode. 


0 5 6 29 30 31 








Branch Target Address 


Figure 4-8. Branch to Absolute Addressing 
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4.2.4.1.4 Branch Conditional to Absolute Addressing Mode 


If the branch conditions are met, instructions that use the branch conditional to absolute 
addressing mode generate the next instruction address by sign extending and appending 
0b00 to the BD operand. Branches using this addressing mode have the absolute addressing 
option enabled (AA field, bit 30, in the instruction encoding = 1). The link register update 
option can be enabled (LK field, bit 31, in the instruction encoding = 1). This option causes 
the effective address of the instruction following the branch instruction to be placed in the 


LR. 


Figure 4-9 shows how the branch target address is generated when using the branch 
conditional to absolute addressing mode. 
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Figure 4-9. Branch Conditional to Absolute Addressing 
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4.2.4.1.5 Branch Conditional to Link Register Addressing Mode 

If the branch conditions are met, the branch conditional to link register instruction generates 
the next instruction address by fetching the contents of the LR and clearing the two low- 
order bits to zero. The link register update option can be enabled (LK field, bit 31, in the 
instruction encoding = 1). This option causes the effective address of the instruction 
following the branch instruction to be placed in the LR. 


Figure 4-10 shows how the branch target address is generated when using the branch 
conditional to link register addressing mode. 
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Figure 4-10. Branch Conditional to Link Register Addressing 
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4.2.4.1.6 Branch Conditional to Count Register Addressing Mode 

If the branch conditions are met, the branch conditional to count register instruction 
generates the next instruction address by fetching the contents of the count register (CTR) 
and clearing the two low-order bits to zero. The link register update option can be enabled 
(LK field, bit 31, in the instruction encoding = 1). This option causes the effective address 
of the instruction following the branch instruction to be placed in the LR. 


Figure 4-11 shows how the branch target address is generated when using the branch 
conditional to count register addressing mode. 
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Figure 4-11. Branch Conditional to Count Register Addressing 
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4.2.4.2 Conditional Branch Control 


For branch conditional instructions, the BO operand specifies the conditions under which 
the branch is taken. The first four bits of the BO operand specify how the branch is affected 
by or affects the condition and count registers. The fifth bit, shown in Table 4-20 as having 
the value y, is used by some PowerPC implementations for branch prediction as described 
below. 


The encodings for the BO operands are shown in Table 4-20. 
Table 4-20. BO Operand Encodings 


a ee. Se il 


In this table, z indicates a bit that is ignored. 
Note that the z bits should be cleared, as they may be assigned a meaning in some future version of the 
PowerPC architecture. 





The y bit provides a hint about whether a conditional branch is likely to be taken, and may be used by some 
PowerPC implementations to improve performance. 


The branch always encoding of the BO operand does not have a y bit. 


Clearing the y bit indicates a predicted behavior for the branch instruction as follows: 
¢ For bex with a negative value in the displacement operand, the branch is taken. 
¢ Inall other cases (bex with a non-negative value in the displacement operand, belrx, 
or bectrx), the branch is not taken. 


Setting the y bit reverses the preceding indications. 


The sign of the displacement operand is used as described above even if the target is an 
absolute address. The default value for the y bit should be 0, and should only be set to 1 if 
software has determined that the prediction corresponding to y = 1 is more likely to be 
correct than the prediction corresponding to y = 0. Software that does not compute branch 
predictions should clear the y bit. 
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In most cases, the branch should be predicted to be taken if the value of the following 
expression is 1, and predicted to fall through if the value is 0. 


((BO[0] & BO[2]) 1S) = BO[4] 


In the expression above, S (bit 16 of the branch conditional instruction coding) is the sign 
bit of the displacement operand if the instruction has a displacement operand and is 0 if the 
operand is reserved. BO[A4] is the y bit, or 0 for the branch always encoding of the BO 
operand. (Advantage is taken of the fact that, for belrx and bectrx, bit 16 of the instruction 
is part of a reserved operand and therefore must be 0.) 


The 5-bit BI operand in branch conditional instructions specifies which of the 32 bits in the 
CR represents the condition to test. 


When the branch instructions contain immediate addressing operands, the target addresses 
can be computed sufficiently ahead of the branch instruction that instructions can be 
fetched along the target path. If the branch instructions use the link and count registers, 
instructions along the target path can be fetched if the link or count register is loaded 
sufficiently ahead of the branch instruction. 


Branching can be conditional or unconditional, and optionally a branch return address is 
created by the access of the effective address of the instruction following the branch 
instruction in the LR after the branch target address has been computed. This is done 
regardless of whether the branch is taken. Some processors may keep a stack of the link 
register values most recently set by branch and link instructions, with the possible 
exception of the form shown below for obtaining the address of the next instruction. To 
benefit from this stack, the following programming conventions should be used. 


In the following examples, let A, B, and Glue represent subroutine labels: 
¢ Obtaining the address of the next instruction— use the following form of branch and 
link: 
bel 20,31,$+4 
¢ Loop counts: 


Keep them in the count register, and use one of the branch conditional instructions 
to decrement the count and to control branching (for example, branching back to the 
start of a loop if the decremented counter value is nonzero). 


¢ Computed GOTOs, case statements, etc.: 


Use the count register to hold the address to branch to, and use the bectr instruction 
with the link register option disabled (LK = 0) to branch to the selected address. 
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¢ Direct subroutine linkage—where A calls B and B returns to A. The two branches 
should be as follows: 


— A calls B: use a branch instruction that enables the link register (LK = 1). 


— B returns to A: use the belr instruction with the link register option disabled 
(LK = 0) (the return address is in, or can be restored to, the link register). 


¢ Indirect subroutine linkage: 


Where A calls Glue, Glue calls B, and B returns to A rather than to Glue. (Such a 
calling sequence is common in linkage code used when the subroutine that the 
programmer wants to call, here B, is in a different module from the caller: the binder 
inserts “glue” code to mediate the branch.) The three branches should be as follows: 


— A calls Glue: use a branch instruction that sets the link register with the link 
register option enabled (LK = 1). 


— Glue calls B: place the address of B in the count register, and use the bectr 
instruction with the link register option disabled (LK = 0). 


— B returns to A: use the belr instruction with the link register option disabled 
(LK = 0) (the return address is in, or can be restored to, the link register). 


4.2.4.3 Branch Instructions 
Table 4-21 describes the branch instructions provided by the PowerPC processors. 


Table 4-21. Branch Instructions 


Branch target_addr Branch. Branch to the address computed as the sum of the 
immediate address and the address of the current instruction. 
Branch Absolute. Branch to the absolute address specified. 
Branch then Link. Branch to the address computed as the sum 
of the immediate address and the address of the current 
instruction. The instruction address following this instruction is 
placed into the link register (LR). 
Branch Absolute then Link. Branch to the absolute address 
specified. The instruction address following this instruction is 
placed into the LR. 


Branch BO,Bl,target_addr |The BI operand specifies the bit in the CR to be used as the condition 


Conditional of the branch. The BO operand is used as described in Table 4-20. 


be Branch Conditional. Branch conditionally to the address 
computed as the sum of the immediate address and the 
address of the current instruction. 
Branch Conditional Absolute. Branch conditionally to the 
absolute address specified. 
Branch Conditional then Link. Branch conditionally to the 
address computed as the sum of the immediate address and 
the address of the current instruction. The instruction address 
following this instruction is placed into the LR. 
Branch Conditional Absolute then Link. Branch conditionally to 
the absolute address specified. The instruction address 
following this instruction is placed into the LR. 
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Table 4-21. Branch Instructions (Continued) 


BO,BI The BI operand specifies the bit in the CR to be used as the condition 
of the branch. The BO operand is used as described in Table 4-20. 
belr Branch Conditional to Link Register. Branch conditionally to 
the address in the LR. 

belrl Branch Conditional to Link Register then Link. Branch 
conditionally to the address specified in the LR. The instruction 
address following this instruction is then placed into the LR. 


Branch 
Conditional 
to Link 
Register 


Branch BO,BI The BI operand specifies the bit in the CR to be used as the condition 
Conditional of the branch. The BO operand is used as described in Table 4-20. 
Seep bectr Branch Conditional to Count Register. Branch conditionally to 
9 the address specified in the count register. 
bectrl Branch Conditional to Count Register then Link. Branch 
conditionally to the address specified in the count register. 
The instruction address following this instruction is placed into 
the LR. 
Note: If the “decrement and test CTR” option is specified (BO[2] = 0), 
the instruction form is invalid. 


4.2.4.4 Simplified Mnemonics for Branch Processor Instructions 


To simplify assembly language programming, a set of simplified mnemonics and symbols 
is provided for the most frequently used forms of branch conditional, compare, trap, rotate 
and shift, and certain other instructions. See Appendix F, “Simplified Mnemonics,” for a 
list of simplified mnemonic examples. 





4.2.4.5 Condition Register Logical Instructions 


Condition register logical instructions, shown in Table 4-22, and the Move Condition 
Register Field (merf) instruction are also defined as flow control instructions. 


Note that if the LR update option is enabled for any of these instructions, the PowerPC 
architecture defines these forms of the instructions as invalid. 


Table 4-22. Condition Register Logical Instructions 
Condition crbD,crbA,crbB | The CR bit specified by crbA is ANDed with the CR bit specified 
Register AND by erbB. The result is placed into the CR bit specified by erbD. 
Condition crbD,crbA,crbB | The CR bit specified by crbA is ORed with the CR bit specified 
Register OR by erbB. The result is placed into the CR bit specified by crbD. 


Condition crbD,crbA,crbB | The CR bit specified by crbA is XORed with the CR bit specified 
Register XOR by erbB. The result is placed into the CR bit specified by crbD. 


Condition crbD,crbA,crbB | The CR bit specified by crbA is ANDed with the CR bit specified 
Register NAND by erbB. The complemented result is placed into the CR bit 
specified by crbD. 
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Table 4-22. Condition Register Logical Instructions (Continued) 


Condition crbD,crbA,crbB | The CR bit specified by crbA is ORed with the CR bit specified 
Register NOR by crbB. The complemented result is placed into the CR bit 
specified by erbD. 


Condition crbD,crbA, crbB | The CR bit specified by crbA is XORed with the CR bit specified 
Register by crbB. The complemented result is placed into the CR bit 
Equivalent specified by crbD. 


Condition crbD,crbA, crbB | The CR bit specified by crbA is ANDed with the complement of 
Register AND the CR bit specified by crbB and the result is placed into the CR 
with Complement bit specified by crbD. 


Condition crbD,crbA, crbB | The CR bit specified by crbA is ORed with the complement of 
Register OR with the CR bit specified by crbB and the result is placed into the CR 
Complement bit specified by crbD. 


Move Condition | merf crfD,crfS The contents of crfS are copied into crfD. No other condition 
Register Field register fields are changed. 


4.2.4.6 Trap Instructions 


The trap instructions shown in Table 4-23 are provided to test for a specified set of 
conditions. If any of the conditions tested by a trap instruction are met, the system trap 
handler is invoked. If the tested conditions are not met, instruction execution continues 
normally. See Appendix F, “Simplified Mnemonics,” for a complete set of simplified 
mnemonics. 





Table 4-23. Trap Instructions 
Operand 
none _|unemon] ra Ceres ae 
Trap Word TO,rA,SIMM | The contents of rA are compared with the sign-extended SIMM operand. 
Immediate If any bit in the TO operand is set and its corresponding condition is met 
by the result of the comparison, the system trap handler is invoked. 





Trap Word TO,rA,rB The contents of rA are compared with the contents of rB. If any bit in the 
TO operand is set and its corresponding condition is met by the result of 
the comparison, the system trap handler is invoked. 
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4.2.4.7 System Linkage Instruction—UISA 

Table 4-24 describes the System Call (sc) instruction that permits a program to call on the 
system to perform a service. See Section 4.4.1, “System Linkage Instructions—OEA,” for 
a complete description of the se instruction. 


Table 4-24. System Linkage Instruction—UISA 


Operand 7 
ron uronone| Brame] ommtn 


System This instruction calls the operating system to perform a service. When control is 
Call returned to the program that executed the system call, the content of the registers 
will depend on the register conventions used by the program providing the system 
service. This instruction is context synchronizing as described in Section 4.1.5.1, 
“Context Synchronizing Instructions.” 
See Section 4.4.1, “System Linkage Instructions—OEA,” for a complete description 
of the sc instruction. 


4.2.5 Processor Control Instructions—UISA 


Processor control instructions are used to read from and write to the condition register 
(CR), machine state register (MSR), and special-purpose registers (SPRs). See 
Section 4.3.1, “Processor Control Instructions—VEA,’ for the mftb instruction and 
Section 4.4.2, “Processor Control Instructions—OEA,’ for information about the 
instructions used for reading from and writing to the MSR and SPRs. 





4.2.5.1 Move to/from Condition Register Instructions 
Table 4-25 summarizes the instructions for reading from or writing to the condition register. 


Table 4-25. Move to/from Condition Register Instructions 


Operand , 
mane |rmmone} eae] maton 


Move to Condition CRM,rS The contents of rS are placed into the CR under control of the field 

Register Fields mask specified by operand CRM. The field mask identifies the 4-bit 
fields affected. Let ibe an integer in the range 0-7. If CRM(/) = 1, CR 
field i (CR bits 4 * ithrough 4 * i+ 3) is set to the contents of the 
corresponding field of rS. 


Move to Condition The contents of XER[0-3] are copied into the condition register field 

Register from XER designated by erfD. All other CR fields remain unchanged. The 
contents of XER[0—3] are cleared. 

Move from The contents of the CR are placed into rD. 

Condition Register 
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4.2.5.2 Move to/from Special-Purpose Register Instructions (UISA) 


Table 4-26 provides a brief description of the mtspr and mfspr instructions. For more 
detailed information refer to Chapter 8, “Instruction Set.” 


Table 4-26. Move to/from Special-Purpose Register Instructions (UISA) 


Move from Special- 
Purpose Register 


Move to Special- SPR,rS___| The value specified by rS are placed in the specified SPR. 
Purpose Register 


rD,SPR_ |The contents of the specified SPR are placed in rD. 


4.2.6 Memory Synchronization Instructions—UISA 


Memory synchronization instructions control the order in which memory operations are 
completed with respect to asynchronous events, and the order in which memory operations 
are seen by other processors or memory access mechanisms. 


The number of cycles required to complete a sync instruction depends on system 
parameters and on the processor's state when the instruction is issued. As a result, frequent 
use of this instruction may degrade performance slightly. The eieio instruction may be more 
appropriate than sync for many cases. 


The PowerPC architecture defines the sync instruction with CR update enabled (Rc field, 
bit 31 = 1) to be an invalid form. 


The proper paired use of the lwarx with stwex. instructions allows programmers to emulate 
common semaphore operations such as test and set, compare and swap, exchange memory, 
and fetch and add. Examples of these semaphore operations can be found in Appendix E, 
“Synchronization Programming Examples.” The Iwarx instruction must be paired with an 
stwex. instruction with the same effective address specified by both instructions of the pair. 
The only exception is that an unpaired stwex. instruction to any (scratch) effective address 
can be used to clear any reservation held by the processor. Note that the reservation 
granularity is implementation-dependent. 


The concept behind the use of the lwarx and stwex. instructions is that a processor may 
load a semaphore from memory, compute a result based on the value of the semaphore, and 
conditionally store it back to the same location. The conditional store is performed based 
upon the existence of a reservation established by the preceding Iwarx instruction. If the 
reservation exists when the store is executed, the store is performed and a bit is set in the 
CR. If the reservation does not exist when the store is executed, the target memory location 
is not modified and a bit is cleared in the CR. 


The Iwarx and stwex. primitives allow software to read a semaphore, compute a result 
based on the value of the semaphore, store the new value back into the semaphore location 
only if that location has not been modified since it was first read, and determine if the store 
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was successful. If the store was successful, the sequence of instructions from the read of the 
semaphore to the store that updated the semaphore appear to have been executed atomically 
(that is, no other processor or mechanism modified the semaphore location between the 
read and the update), thus providing the equivalent of a real atomic operation. However, in 
reality, other processors may have read from the location during this operation. 


The Iwarx and stwex. instructions require the EA to be aligned. 


In general, the Iwarx and stwex. instructions should be used only in system programs, 
which can be invoked by application programs as needed. 


At most one reservation exists simultaneously on any processor. The address associated 
with the reservation can be changed by a subsequent Iwarx instruction. The conditional 
store is performed based upon the existence of a reservation established by the preceding 
lwarx instruction. 


A reservation held by the processor is cleared (or may be cleared, in the case of the fourth 
and fifth bullet items) by one of the following: 


¢ The processor holding the reservation executes another lwarx instruction; this clears 
the first reservation and establishes a new one. 


¢ The processor holding the reservation executes any stwex. instruction whether its 
address matches that of the lwarx. 


¢ Some other processor executes a store or dcbz to the same reservation granule, or 
modifies a referenced or changed bit in the same reservation granule. 

¢ Some other processor executes a debtst, dcbst, dcbf, or dcbi to the same reservation 
granule; whether the reservation is cleared is undefined. 

¢ Some other processor executes a dcba to the same reservation granule. The 
reservation is cleared if the instruction causes the target block to be newly 
established in the data cache or to be modified; otherwise, whether the reservation is 
cleared is undefined. 


¢ Some other mechanism modifies a memory location in the same reservation granule. 


Note that exceptions do not clear reservations; however, system software invoked by 
exceptions may clear reservations. 
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Table 4-27 summarizes the memory synchronization instructions as defined in the UISA. 
See Section 4.3.2, “Memory Synchronization Instructions—VEA,” for details about 
additional memory synchronization (eieio and isync) instructions. 


Table 4-27. Memory Synchronization Instructions—UISA 


Operand 
a ay 


Load Word rD,rA,rB_ | The EA is the sum (rA|0) + (rB). The word in memory addressed by the EA is 
and Reserve loaded into rD. 
Indexed 


Store Word rS,rA,rB | The EA is the sum (rA|0) + (rB). 

alee If a reservation exists and the effective address specified by the stwex. 
instruction is the same as that specified by the load and reserve instruction 
that established the reservation, the contents of rS are stored into the word in 
memory addressed by the EA, and the reservation is cleared. 
If a reservation exists but the effective address specified by the stwex. 
instruction is not the same as that specified by the load and reserve 
instruction that established the reservation, the reservation is cleared, and it is 
undefined whether the contents of rS are stored into the word in memory 
addressed by the EA. 
If a reservation does not exist, the instruction completes without altering 
memory or the contents of the cache. 





Synchronize | syne Executing a sync instruction ensures that all instructions preceding the sync 
instruction appear to have completed before the sync instruction completes, 
and that no subsequent instructions are initiated by the processor until after 
the sync instruction completes. When the sync instruction completes, all 
memory accesses caused by instructions preceding the sync instruction will 
have been performed with respect to all other mechanisms that access 
memory. 

See Chapter 8, “Instruction Set,” for more information. 


4.2.7 Recommended Simplified Mnemonics 


To simplify assembly language programs, a set of simplified mnemonics is provided for 
some of the most frequently used operations (such as no-op, load immediate, load address, 
move register, and complement register). Assemblers should provide the simplified 
mnemonics listed in Section F.9, “Recommended Simplified Mnemonics.” Programs 
written to be portable across the various assemblers for the PowerPC architecture should 
not assume the existence of mnemonics not described in this document. 


For a complete list of simplified mnemonics, see Appendix F, “Simplified Mnemonics.” 
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4.3 PowerPC VEA Instructions 


The PowerPC virtual environment architecture (VEA) describes the semantics of the 
memory model that can be assumed by software processes, and includes descriptions of the 
cache model, cache-control instructions, address aliasing, and other related issues. 
Implementations that conform to the VEA also adhere to the UISA, but may not necessarily 
adhere to the OKA. 


This section describes additional instructions that are provided by the VEA. 


4.3.1 Processor Control Instructions—VEA 


The VEA defines the mftb instruction (user-level instruction) for reading the contents of 
the time base register; see Chapter 5, “Cache Model and Memory Coherency,” for more 
information. Table 4-28 describes the mftb instruction. 


Simplified mnemonics are provided (See Section F.8, “Simplified Mnemonics for Special- 
Purpose Registers”) for the mftb instruction so it can be coded with the TBR name as part 
of the mnemonic rather than requiring it to be coded as an operand. The simplified 
mnemonics Move from Time Base (mftb) and Move from Time Base Upper (mftbu) are 
variants of the mftb instruction rather than of the mfspr instruction. The mftb instruction 
serves as both a basic and simplified mnemonic. Assemblers recognize an mftb mnemonic 
with two operands as the basic form, and an mftb mnemonic with one operand as the 
simplified form. 


On 32-bit implementations, it is not possible to read the entire 64-bit time base register in 
a single instruction. The mftb simplified mnemonic moves from the lower half of the time 
base register (TBL) to a GPR, and the mftbu simplified mnemonic moves from the upper 
half of the time base (TBU) to a GPR. 


Table 4-28. Move from Time Base Instruction 





rD, TBR The TBR field denotes either time base lower or time base upper, encoded 
as shown in Table 4-29 and Table 4-30. The contents of the designated 
register are copied to rD. 


Table 4-29 summarizes the time base (TBL/TBU) register encodings to which user-level 
access (using mftb) is permitted (as specified by the VEA). 


Table 4-29. User-Level TBR Encodings (VEA) 


Decimal Value Register 
ener die ze 


01100 01000 | TL Time base lower (read-only) 
= 01101 01000 | TU Time base upper (read-only) 
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Table 4-30 summarizes the TBL and TBU register encodings to which supervisor-level 
access (using mtspr) is permitted. 


Table 4-30. Supervisor-Level TBR Encodings (VEA) 





41100 01000 Time base lower (write only) 
11101 01000 Time base upper (write only) 


'Moving from the time base (TBL and TBU) can also be accomplished with the mftb instruction. 


4.3.2 Memory Synchronization Instructions—VEA 


Memory synchronization instructions control the order in which memory operations are y 
completed with respect to asynchronous events, and the order in which memory operations 
are seen by other processors or memory access mechanisms. See Chapter 5, “Cache Model 
and Memory Coherency,” for additional information about these instructions and about 
related aspects of memory synchronization. 


System designs that use a second-level cache should take special care to recognize the vy 
hardware signaling caused by a syne operation and perform the appropriate actions to 
guarantee that memory references that may be queued internally to the second-level cache 
have been performed globally. 


In addition to the sync instruction (specified by UISA), the VEA defines the Enforce In- 
Order Execution of I/O (eieio) and Instruction Synchronize (isyne) instructions; see 
Table 4-31. The number of cycles required to complete an eieio instruction depends on 
system parameters and on the processor's state when the instruction is issued. As a result, 
frequent use of this instruction may degrade performance slightly. 


The isyne instruction causes the processor to wait for any preceding instructions to 
complete, discard all prefetched instructions, and then branch to the next sequential 
instruction (which has the effect of clearing the pipeline behind the isync instruction). 
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Table 4-31. Memory Synchronization Instructions—VEA 


Operand . 
momo] eigenen 
Enforce In-Order The eieio instruction provides an ordering function for the effects of loads 
Execution of I/O and stores executed by a processor. 


processor initiates no subsequent instructions until the isyne instruction 
completes. Finally, it causes the processor to discard any prefetched 
instructions, so subsequent instructions will be fetched and executed in 
the context established by the instructions preceding the isyne 
instruction. 


This instruction does not affect other processors or their caches. 





Instruction isync Executing an isync instruction ensures that all previous instructions 

Synchronize complete before the isync instruction completes, although memory 
accesses caused by those instructions need not have been performed 
with respect to other processors and mechanisms. It also ensures that the 


4.3.3 Memory Control Instructions—VEA 


Memory control instructions include the following types: 


¢ Cache management instructions (user-level and supervisor-level) 
¢ Segment register manipulation instructions 

¢ Segment lookaside buffer management instructions 

¢ Translation lookaside buffer management instructions 


This section describes the user-level cache management instructions defined by the VEA. 
See Section 4.4.3, “Memory Control Instructions—OEA,” for more information about 
supervisor-level cache, segment register manipulation, and translation lookaside buffer 
management instructions. 


4.3.3.1 User-Level Cache Instructions—VEA 


The instructions summarized in this section provide user-level programs the ability to 
manage on-chip caches if they are implemented. See Chapter 5, “Cache Model and 
Memory Coherency,” for more information about cache topics. 


As with other memory-related instructions, the effect of the cache management instructions 
on memory are weakly ordered. If the programmer needs to ensure that cache or other 
instructions have been performed with respect to all other processors and system 
mechanisms, a sync instruction must be placed in the program following those instructions. 


Note that when data address translation is disabled (MSR[DR] = 0), the Data Cache Block 
Clear to Zero (dcbz) and the Data Cache Block Allocate (dcba) instructions allocate a 
cache block in the cache and may not verify that the physical address (referred to as real 
address in the architecture specification) is valid. If a cache block is created for an invalid 
physical address, a machine check condition may result when an attempt is made to write 
that cache block back to memory. The cache block could be written back as a result of the 
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execution of an instruction that causes a cache miss and the invalid addressed cache block 
is the target for replacement or a Data Cache Block Store (debst) instruction. 


Any cache control instruction that generates an effective address that corresponds to a 
direct-store segment (segment descriptor[T] = 1) is treated as a no-op. However, note that 
the direct-store facility is being phased out of the architecture and will not likely be 
supported in future devices. 


Table 4-32 summarizes the cache instructions defined by the VEA. Note that these 
instructions are accessible to user-level programs. 


Table 4-32. User-Level Cache Instructions 


Operand 
ieee eee 
rA,rB The EA is the sum (rAJO) + 
This instruction is a hint that ele will probably be improved if the block 
containing the byte addressed by EA is fetched into the data cache, because 
the program will probably soon load from the addressed byte. 
Data rA,rB The EA is the sum (rA|O) + (rB). 
ane This instruction is a hint that performance will probably be improved if the block 
Tk of hi containing the byte addressed by EA is fetched into the data cache, because 
are oF the program will probably soon store into the addressed byte. 





rA,rB The EA is the sum (rAJO) + (rB). 
If the cache block containing the byte addressed by the EA is in the data cache, 
all bytes of the cache block are made undefined, but the cache block is still 
considered valid. Note that programming errors can occur if the data in this 
cache block is subsequently read or used inadvertently. 
If the page containing the byte addressed by the EA is not in the data cache and 
the corresponding page is marked caching allowed (I = 0), the cache block is 
allocated (and made valid) in the data cache without fetching the block from 
main memory, and the value of all bytes of the cache block is undefined. 
If the page containing the byte addressed by the EA is marked caching inhibited 
(WIM = x1x), this instruction is treated as a no-op. 
If the cache block addressed by the EA is located in a page marked as memory 
coherent (WIM = xx1) and the cache block exists in the caches of other 
processors, memory coherence is maintained in those caches. 
The deba instruction is treated as a store to the addressed byte with respect to 
address translation, memory protection, referenced and changed recording, 
and the ordering enforced by eieio or by the combination of caching-inhibited 
and guarded attributes for a page. 
This instruction is optional in the PowerPC architecture. 
(In the PowerPC OEA, the deba instruction is additionally defined to clear all 
bytes of a newly established block to zero in the case that the block did not 
already exist in the cache.) 
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Table 4-32. User-Level Cache Instructions (Continued) 


romani] Bre 


a rB The EA is the sum (rA|O) + 
If the cache block ae the oe addressed by the EA is in the data cache, 
all bytes of the cache block are cleared to zero. 
If the page containing the byte addressed by the EA is not in the data cache and 
the corresponding page is marked caching allowed (I = 0), the cache block is 
established in the data cache without fetching the block from main memory, and 
all bytes of the cache block are cleared to zero. 
If the page containing the byte addressed by the EA is marked caching inhibited 
(WIM = x1x) or write-through (WIM = 1xx), either all bytes of the area of main 
memory that corresponds to the addressed cache block are cleared to zero, or 
an alignment exception occurs. 
If the cache block addressed by the EA is located in a page marked as memory 
coherent (WIM = xx1) and the cache block exists in the caches of other 
processors, memory coherence is maintained in those caches. 
The debz instruction is treated as a store to the addressed byte with respect to 
address translation, memory protection, referenced and changed recording, 
and the ordering enforced by eieio or by the combination of caching-inhibited 
and guarded attributes for a page. 





Data rA,rB The EA is the sum(rA|O) + (rB). 
see If the cache block containing the byte addressed by the EA is located in a page 
och tle marked memory coherent (WIM = xx1), and a cache block containing the byte 

addressed by EA is in the data cache of any processor and has been modified, 
the cache block is written to main memory. 
If the cache block containing the byte addressed by the EA is located in a page 
not marked memory coherent (WIM = xx0), and a cache block containing the 
byte addressed by EA is in the data cache of this processor and has been 
modified, the cache block is written to main memory. 
The function of this instruction is independent of the write-through/write-back 
and caching-inhibited/caching-allowed modes of the cache block containing the 
byte addressed by the EA. 
The debst instruction is treated as a load from the addressed byte with respect 
to address translation and memory protection. It may also be treated as a load 
for referenced and changed bit recording except that referenced and changed 
bit recording may not occur. 


4-60 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


Table 4-32. User-Level Cache Instructions (Continued) 


Operand 
Bc 


Data rA,rB The EA is the sum (rA|O) + 

Sa The action taken Fae on the memory mode associated with the target, and 
on the state of the block. The following list describes the action taken for the 
various cases, regardless of whether the page or block containing the 
addressed byte is designated as write-through or if it is in the caching-inhibited 
or caching-allowed mode. 
* Coherency required (WIM = xx1) 

— Unmodified block—Invalidates copies of the block in the caches of all 
processors. 

— Modified block—Copies the block to memory. Invalidates copies of the 
block in the caches of all processors. 

— Absent block—If modified copies of the block are in the caches of other 
processors, causes them to be copied to memory and invalidated. If 
unmodified copies are in the caches of other processors, causes those 
copies to be invalidated. 

Coherency not required (WIM = xx0) 

— Unmodified block—Invalidates the block in the processor’s cache. 

— Modified block—Copies the block to memory. Invalidates the block in the 
processor's cache. 

— Absent block—Does nothing. 

The function of this instruction is independent of the write-through/write-back 
and caching-inhibited/caching-allowed modes of the cache block containing the 
byte addressed by the EA. 

The debf instruction is treated as a load from the addressed byte with respect 
to address translation and memory protection. It may also be treated as a load 
for referenced and changed bit recording except that referenced and changed 
bit recording may not occur. 





Instruction rArB The EA is the sum (rAJO) + (rB). 

oe If the cache block containing the byte addressed by EA is located in a page 

Invalidate marked memory coherent (WIM = xx1), and a cache block containing the byte 
addressed by EA is in the instruction cache of any processor, the cache block is 
made invalid in all such instruction caches, so that the next reference causes 
the cache block to be refetched. 
If the cache block containing the byte addressed by EA is located in a page not 
marked memory coherent (WIM = xx0), and a cache block containing the byte 
addressed by EA is in the instruction cache of this processor, the cache block is 
made invalid in that instruction cache, so that the next reference causes the 
cache block to be refetched. 
The function of this instruction is independent of the write-through/write-back 
and caching-inhibited/caching-allowed modes of the cache block containing the 
byte addressed by the EA. 
The icbi instruction is treated as a load from the addressed byte with respect to 
address translation and memory protection. It may also be treated as a load for 
referenced and changed bit recording except that referenced and changed bit 
recording may not occur. 
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4.3.4 External Control Instructions 


The external control instructions allow a user-level program to communicate with a special- 
purpose device. Two instructions are provided and are summarized in Table 4-33. 


Table 4-33. External Control Instructions 


Operand ; 
[none |rinonone| Bret] pomten | 


External rD,rA,rB 
Control In 


rS,rA,rB 


The EA is the sum (rA|O) + (rB). 


A load word request for the physical address corresponding to the EA is sent to 
the device identified by the EAR[RID] (bits 26-31), bypassing the cache. The 
word returned by the device is placed into rD. The EA sent to the device must be 
word-aligned. 


This instruction is treated as a load from the addressed byte with respect to 
address translation, memory protection, referenced and changed recording, and 
the ordering performed by eieio. 


This instruction is optional. 


The EA is the sum (rA|O) + (rB). 


A store word request for the physical address corresponding to the EA and the 
contents of rS are sent to the device identified by EAR[RID] (bits 26-31), 
bypassing the cache. The EA sent to the device must be word-aligned. 


This instruction is treated as a store to the addressed byte with respect to 
address translation, memory protection, referenced and changed recording, and 
the ordering performed by eieio. Software synchronization is required in order to 
ensure that the data access is performed in program order with respect to data 
accesses caused by other store or ecowx instructions, even though the 
addressed byte is assumed to be caching-inhibited and guarded. 


This instruction is optional. 
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4.4 PowerPC OEA Instructions 


The PowerPC operating environment architecture (OEA) includes the structure of the y 
memory management model, supervisor-level registers, and the exception model. 
Implementations that conform to the OEA also adhere to the UISA and the VEA. This 
section describes the instructions provided by the OEA. 


4.4.1 System Linkage Instructions—OEA 


This section describes the system linkage instructions (see Table 4-34). The se instruction 6 
is a user-level instruction that permits a user program to call on the system to perform a 
service and causes the processor to take an exception. The rfi instruction is a supervisor- 
level instruction that is useful for returning from an exception handler. 


System 
Call 


Table 4-34. System Linkage Instructions—OEA 


Operand : 


When executed, the effective address of the instruction following the sc instruction 
is placed into SRRO. Bits 1-4, and 10-15 of SRR1 are cleared. Additionally, bits 
16-23, 25-27, and 30-31of the MSR are placed into the corresponding bits of 
SRR1. Depending on the implementation, additional bits of MSR may also be 
saved in SRR1. Then a system call exception is generated. The exception causes 
the MSR to be altered as described in Section 6.4, “Exception Definitions.” 


The exception causes the next instruction to be fetched from offset OxC00 from 
the base physical address indicated by the new setting of MSR[IP]. 


This instruction is context synchronizing. 


Bits 16-23, 25-27, and 30-31 of SRR1 are placed into the corresponding bits of 
the MSR. Depending on the implementation, additional bits of MSR may also be 
restored from SRR1. If the new MSR value does not enable any pending 


exceptions, the next instruction is fetched, under control of the new MSR value, 
from the address SRRO[0-29] || Ob00. 


If the new MSR value enables one or more pending exceptions, the exception 
associated with the highest priority pending exception is generated; in this case 
the value placed into SRRO (machine status save/restore 0) by the exception 
processing mechanism is the address of the instruction that would have been 
executed next had the exception not occurred. 


This is a supervisor-level instruction and is context-synchronizing. 


This instruction is defined only for 32-bit implementations. The use of the rfi 
instruction on a 64-bit implementation will invoke the system exception handler. 
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4.4.2 Processor Control Instructions—OEA 


This section describes the processor control instructions that are used to read from and 
write to the MSR and the SPRs. 


4.4.2.1 Move to/from Machine State Register Instructions 
Table 4-35 summarizes the instructions used for reading from and writing to the MSR. 


Table 4-35. Move to/from Machine State Register Instructions 


Operand ; 
mane fimo} emt | maton | 


Move to Machine rs The contents of rS are placed into the MSR. 
Geakecny This instruction is a supervisor-level instruction and is context 
¥ synchronizing except with respect to alterations to the POW and LE 
bits. Refer to Section 2.3.17, “Synchronization Requirements for 
Special Registers and for Lookaside Buffers,” for more information. 
Move from Machine The contents of the MSR are placed into rD. This is a supervisor-level 
State Register instruction. 


4.4.2.2 Move to/from Special-Purpose Register Instructions (OEA) 
Provided is a brief description of the mtspr and mfspr instructions (see Table 4-36). For 
more detailed information, see Chapter 8, “Instruction Set.” Simplified mnemonics are 
provided for the mtspr and mfspr instructions in Appendix F, “Simplified Mnemonics.” 
For a discussion of context synchronization requirements when altering certain SPRs, refer 
to Appendix E, “Synchronization Programming Examples.” 





Table 4-36. Move to/from Special-Purpose Register Instructions (OEA) 


Operand . 
jane Jatronone | Brit | peat | 


Move to SPR,rS The SPR field denotes a special-purpose register. The contents of rS 

Special- are placed into the designated SPR. For SPRs that are 32 bits long, 

Purpose the contents of rS are placed into the SPR. 

Fed Istel For this instruction, SPRs TBL and TBU are treated as separate 32- 
bit registers; setting one leaves the other unaltered. 





Move from rD,SPR The SPR field denotes a special-purpose register. The contents of the 
Special- designated SPR are placed into rD. 

Purpose 

Register 


For mtspr and mfspr instructions, the SPR number coded in assembly language does not 
appear directly as a 10-bit binary number in the instruction. The number coded is split into 
two 5-bit halves that are reversed in the instruction encoding, with the high-order 5 bits 
appearing in bits 16—20 of the instruction encoding and the low-order 5 bits in bits 11-15. 
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For information on SPR encodings (both user- and supervisor-level), see Chapter 8, 
“Instruction Set.” Note that there are additional SPRs specific to each implementation; for 
implementation-specific SPRs, see the user’s manual for that particular processor. 


4.4.3 Memory Control Instructions—OEA 
Memory control instructions include the following types of instructions: 


¢ Cache management instructions (supervisor-level and user-level) 
¢ Segment register manipulation instructions 
¢ Translation lookaside buffer management instructions 


This section describes supervisor-level memory control instructions. See Section 4.3.3, 
“Memory Control Instructions—VEA,” for more information about user-level cache 
management instructions. 


4.4.3.1 Supervisor-Level Cache Management Instruction 

Table 4-37 summarizes the operation of the only supervisor-level cache management 
instruction. See Section 4.3.3.1, “User-Level Cache Instructions—VEA,” for cache 
instructions that provide user-level programs the ability to manage the on-chip caches. 


Note that any cache control instruction that generates an effective address that corresponds 
to a direct-store segment (segment descriptor[T] = 1) is treated as a no-op. However, note 
that the direct-store facility is being phased out of the architecture and will not likely be 
supported in future devices. 
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Table 4-37. Cache Management Supervisor-Level Instruction 


Operand 
ee al 





Data rA,rB The EA is the sum (rA|O) + 
ie The action taken depends i the memory mode associated with the target, and 
Invalidate the state (modified, unmodified) of the cache block. The following list describes 
the action to take if the cache block containing the byte addressed by the EA is or 
is not in the cache. 
* Coherency required (WIM = xx1) 
— Unmodified cache block—Invalidates copies of the cache block in the 
caches of all processors. 
— Modified cache block—lInvalidates copies of the cache block in the caches 
of all processors. (Discards the modified contents.) 
— Absent cache block—lf copies are in the caches of any other processor, 
causes the copies to be invalidated. (Discards any modified contents.) 
Coherency not required (WIM = xx0) 
— Unmodified cache block—Invalidates the cache block in the local cache. 
— Modified cache block—Invalidates the cache block in the local cache. 
(Discards the modified contents.) 
— Absent cache block—No action is taken. 
When data address translation is enabled, MSR[DT]=1, and the logical (effective) 
address has no translation, a data access exception occurs. 
The function of this instruction is independent of the write-through and cache- 
inhibited/allowed modes determined by the WIM bit settings of the block 
containing the byte addressed by the EA. 
This instruction is treated as a store to the addressed byte with respect to 
address translation and protection, except that the change bit need not be set, 
and if the change bit is not set then the reference bit need not be set. 


4.4.3.2 Segment Register Manipulation Instructions 


The instructions listed in Table 4-38 provide access to the segment registers for 32-bit 
implementations. These instructions operate completely independently of the MSR[IR] and 
MSR[DR] bit settings. Refer to Section 2.3.17, “Synchronization Requirements for Special 
Registers and for Lookaside Buffers,’ for serialization requirements and other 
recommended precautions to observe when manipulating the segment registers. 
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Table 4-38. Segment Register Manipulation Instructions 
Operand A 

ee ed 
Move to Segment SR,rS The contents of rS are placed into segment register specified by 
Register operand SR. 
(Se bit only, This is a supervisor-level instruction. 
Move to Segment rS,rB The contents of rS are copied to the segment register selected by bits 
Register Indirect 0-3 of rB. 
(Sabir Only This is a supervisor-level instruction. 





Move from Segment rD,SR The contents of the segment register specified by operand SR are 
Register placed into rD. 

SPbitonly This is a supervisor-level instruction. 

Move from Segment The contents of the segment register selected by bits 0-3 of rB are 
Register Indirect copied into rD. 

(S2ybitonly, This is a supervisor-level instruction. 


4.4.3.3 Translation Lookaside Buffer Management Instructions 

The address translation mechanism is defined in terms of segment descriptors and page 
table entries (PTEs) used by PowerPC processors to locate the logical-to-physical address 
mapping for a particular access. These segment descriptors and PTEs reside in segment 
tables and page tables in memory, respectively. 








For performance reasons, many processors implement one or more translation lookaside 
buffers on-chip. These are caches of portions of the page table. As changes are made to the 
address translation tables, it is necessary to maintain coherency between the TLB and the 
updated tables. This is done by invalidating TLB entries, or occasionally by invalidating the 
entire TLB, and allowing the translation caching mechanism to refetch from the tables. 


Each PowerPC implementation that has a TLB provides means for invalidating an 
individual TLB entry and invalidating the entire TLB. 


If a processor does not implement a TLB, it treats the corresponding instructions (tlbie, 
tlbia, and tlbsync) either as no-ops or as illegal instructions. 
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Refer to Chapter 7, “Memory Management,” for more information about TLB operation. 
Table 4-39 summarizes the operation of the SLB and TLB instructions. 


Table 4-39. Translation Lookaside Buffer Management Instructions 


Operand : 
a a 


TLB The EA is the contents of rB. If the TLB contains an entry corresponding to the 

Invalidate EA, that entry is removed from the TLB. The TLB search is performed 

Entry regardless of the settings of MSR[IR] and MSR[DR]. Block address translation 
for the EA, if any, is ignored. 
This instruction causes the target TLB entry to be invalidated in all processors. 
The operation performed by this instruction is treated as a caching inhibited 
and guarded data access with respect to the ordering performed by eieio. 
This is a supervisor-level instruction and optional in the PowerPC architecture. 





TLB All TLB entries are made invalid. The TLB is invalidated regardless of the 
Invalidate All settings of MSR[IR] and MSR[DR]. 
This instruction does not cause the entries to be invalidated in other 
processors. 
This is a supervisor-level instruction and optional in the PowerPC architecture. 
TLB tlbsync Executing a tlbsync instruction ensures that all tlbie instructions previously 
Synchronize executed by the processor executing the tlbsync instruction have completed 
on all processors. 
The operation performed by this instruction is treated as a caching-inhibited 
and guarded data access with respect to the ordering performed by eieio. 
This is a supervisor-level instruction and optional in the PowerPC architecture. 


Because the presence and exact semantics of the translation lookaside buffer management 
instructions is implementation-dependent, system software should incorporate uses of the 
instruction into subroutines to minimize compatibility problems. 
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Chapter 5 
Cache Model and Memory Coherency 


This chapter summarizes the cache model as defined by the virtual environment 


U 


architecture (VEA) as well as the built-in architectural controls for maintaining memory V 


coherency. This chapter describes the cache control instructions and special concerns for 
memory coherency in single-processor and multiprocessor systems. Aspects of the 
operating environment architecture (OEA) as they relate to the cache model and memory 
coherency are also covered. 


The PowerPC architecture provides for relaxed memory coherency. Features such as write- 
back caching and out-of-order execution allow software engineers to exploit the 
performance benefits of weakly-ordered memory access. The architecture also provides the 
means to control the order of accesses for order-critical operations. 


In this chapter, the term multiprocessor is used in the context of maintaining cache 
coherency. In this context, a system could include other devices that access system memory, 
maintain independent caches, and function as bus masters. 


Each cache management instruction operates on an aligned unit of memory. The VEA 
defines this cacheable unit as a block. Since the term ‘block’ is easily confused with the unit 
of memory addressed by the block address translation (BAT) mechanism, this chapter uses 
the term ‘cache block’ to indicate the cacheable unit. The size of the cache block can vary 
by instruction and by implementation. In addition, the unit of memory at which coherency 
is maintained is called the coherence block. The size of the coherence block is also 
implementation-specific. However, the coherence block is often the same size as the cache 
block. 


5.1 The Virtual Environment 


The user instruction set architecture (UISA) relies upon a memory space of 2°* bytes for 
applications. The VEA expands upon the memory model by introducing virtual memory, 
caches, and shared memory multiprocessing. Although many applications will not need to 
access the features introduced by the VEA, it is important that programmers are aware that 
they are working in a virtual environment where the physical memory may be shared by 
multiple processes running on one or more processors. 


932 
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This section describes load and store ordering, atomicity, the cache model, memory 
coherency, and the VEA cache management instructions. The features of the VEA are 
accessible to both user-level and supervisor-level applications (referred to as problem state 
and privileged state, respectively, in the architecture specification). 


The mechanism for controlling the virtual memory space is defined by the OEA. The 
features of the OEA are accessible to supervisor-level applications only (typically operating 
systems). For more information on the address translation mechanism, refer to Chapter 7, 
“Memory Management.” 


5.1.1 Memory Access Ordering 


The VEA specifies a weakly consistent memory model for shared memory multiprocessor 
systems. This model provides an opportunity for significantly improved performance over 
a model that has stronger consistency rules, but places the responsibility for access ordering 
on the programmer. When a program requires strict access ordering for proper execution, 
the programmer must insert the appropriate ordering or synchronization instructions into 
the program. 


The order in which the processor performs memory accesses, the order in which those 
accesses complete in memory, and the order in which those accesses are viewed as 
occurring by another processor may all be different. A means of enforcing memory access 
ordering is provided to allow programs (or instances of programs) to share memory. Similar 
means are needed to allow programs executing on a processor to share memory with some 
other mechanism, such as an I/O device, that can also access memory. 


Various facilities are provided that enable programs to control the order in which memory 
accesses are performed by separate instructions. First, if separate store instructions access 
memory that is designated as both caching-inhibited and guarded, the accesses are 
performed in the order specified by the program. Refer to Section 5.1.4, “Memory 
Coherency,” and Section 5.2.1, “Memory/Cache Access Attributes,’ for a complete 
description of the caching-inhibited and guarded attributes. Additionally, two instructions, 
eieio and sync, are provided that enable the program to control the order in which the 
memory accesses caused by separate instructions are performed. 


No ordering should be assumed among the memory accesses caused by a single instruction 
(that is, by an instruction for which multiple accesses are not atomic), and no means are 
provided for controlling that order. Chapter 4, “Addressing Modes and Instruction Set 
Summary,” contains additional information about the sync and eieio instructions. 


5.1.1.1 Enforce In-Order Execution of I/O Instruction 

The eieio instruction permits the program to control the order in which loads and stores are 
performed when the accessed memory has certain attributes, as described in Chapter 8, 
“Instruction Set.” For example, eieio can be used to ensure that a sequence of load and store 
operations to an I/O device’s control registers updates those registers in the desired order. 
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The eieio instruction can also be used to ensure that all stores to a shared data structure are 
visible to other processors before the store that releases the lock is visible to them. 


The eieio instruction may complete before memory accesses caused by instructions 
preceding the eieio instruction have been performed with respect to system memory or 
coherent storage as appropriate. 


If stronger ordering is desired, the sync instruction must be used. 


5.1.1.2 Synchronize Instruction 


When a portion of memory that requires coherency must be forced to a known state, it is 
necessary to synchronize memory with respect to other processors and mechanisms. This 
synchronization is accomplished by requiring programs to indicate explicitly in the 
instruction stream, by inserting a syne instruction, that synchronization is required. Only 
when sync completes are the effects of all coherent memory accesses previously executed 
by the program guaranteed to have been performed with respect to all other processors and 
mechanisms that access those locations coherently. 


The sync instruction ensures that all the coherent memory accesses, initiated by a program, 
have been performed with respect to all other processors and mechanisms that access the 
target locations coherently, before its next instruction is executed. A program can use this 
instruction to ensure that all updates to a shared data structure, accessed coherently, are 
visible to all other processors that access the data structure coherently, before executing a 
store that will release a lock on that data structure. Execution of the sync instruction does 
the following: 


¢ Performs the functions described for the sync instruction in Section 4.2.6, “Memory 
Synchronization Instructions—UISA.” 


¢ Ensures that consistency operations, and the effects of icbi, dcbz, dcbst, dcbf, dcba, 
and debi instructions previously executed by the processor executing sync, have 
completed on such other processors as the memory/cache access attributes of the 
target locations require. 


¢ Ensures that TLB invalidate operations previously executed by the processor 
executing the syne have completed on that processor. The syne instruction does not 
wait for such invalidates to complete on other processors. 


¢ Ensures that memory accesses due to instructions previously executed by the 
processor executing the syne are recorded in the R and C bits in the page table and 
that the new values of those bits are visible to all processors and mechanisms; refer 
to Section 7.5.3, “Page History Recording.” 


The syne instruction is execution synchronizing. It is not context synchronizing, and 
therefore need not discard prefetched instructions. 
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For memory that does not require coherency, the syne instruction operates as described 
above except that its only effect on memory operations is to ensure that all previous 
memory operations have completed, with respect to the processor executing the syne 
instruction, to the level of memory specified by the memory/cache access attributes 
(including the updating of R and C bits). 


5.1.2 Atomicity 


An access is atomic if it is always performed in its entirety with no visible fragmentation. 
Atomic accesses are thus serialized—each happens in its entirety in some order, even when 
that order is neither specified in the program nor enforced between processors. 


Only the following single-register accesses are guaranteed to be atomic: 


¢ Byte accesses (all bytes are aligned on byte boundaries) 
¢ Half-word accesses aligned on half-word boundaries 
¢ Word accesses aligned on word boundaries 


No other accesses are guaranteed to be atomic. In particular, the accesses caused by the 
following instructions are not guaranteed to be atomic: 


¢ Load and store instructions with misaligned operands 

¢ Imw, stmwy, Iswi, Ilswx, stswi, or stswx instructions 

¢ Floating-point double-word accesses in 32-bit implementations 
e Any cache management instructions 


The Iwarx/stwex. instruction combinations can be used to perform atomic memory 
references. The Iwarx instruction is a load from a word-aligned location that has two side 
effects: 


1. A reservation for a subsequent stwex. instruction is created. 


2. The memory coherence mechanism is notified that a reservation exists for the 
memory location accessed by the lwarx. 


The stwex. instruction is a store to a word-aligned location that is conditioned on the 
existence of the reservation created by lwarx and on whether the same memory location is 
specified by both instructions and whether the instructions are issued by the same 
processor. 


In a multiprocessor system, every processor (other than the one executing Iwarx/stwex.) 
that might update the location must configure the addressed page as memory coherency 
required. The lwarx/stwex. instructions function in caching-inhibited, as well as in 
caching-allowed, memory. If the addressed memory is in write-through mode, it is 
implementation-dependent whether these instructions function correctly or cause the DSI 
exception handler to be invoked. (Note that exceptions are referred to as interrupts in the 
architecture specification.) 
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The Iwarx/stwex. instruction combination is described in Section 4.2.6, “Memory 
Synchronization Instructions—UISA,” and Chapter 8, “Instruction Set.” 


5.1.3 Cache Model 


The PowerPC architecture does not specify the type, organization, implementation, or even 
the existence of a cache. The standard cache model has separate instruction and data caches, 
also known as a Harvard cache model. However, the architecture allows for many different 
cache types. Some implementations will have a unified cache (where there is a single cache 
for both instructions and data). Other implementations may not have a cache at all. 


The function of the cache management instructions depends on the implementation of the 
cache(s) and the setting of the memory/cache access modes. For a program to execute 
properly on all implementations, software should use the Harvard model. In cases where a 
processor is implemented without a cache, the architecture guarantees that instructions 
affecting the nonimplemented cache will not halt execution (note that debz may cause an 
alignment exception on some implementations). For example, a processor with no cache 
may treat a cache instruction as a no-op. Or, a processor with a unified cache may treat the 
icbi instruction as a no-op. In this manner, programs written for separate instruction and 
data caches will run on all compliant implementations. 


5.1.4 Memory Coherency 


The primary objective of a coherent memory system is to provide the same image of 
memory to all devices using the system. The VEA and OEA define coherency controls that 
facilitate synchronization, cooperative use of shared resources, and task migration among 
processors. These controls include the memory/cache access attributes, the sync and eieio 
instructions, and the Ilwarx/stwex. instruction pair. Without these controls, the processor 
could not support a weakly-ordered memory access model. 


A strongly-ordered memory access model hinders performance by requiring excessive 
overhead, particularly in multiprocessor environments. For example, a processor 
performing a store operation in a strongly-ordered system requires exclusive access to an 
address before making an update, to prevent another device from using stale data. 


The VEA defines a page as a unit of memory for which protection and control attributes are 
independently specifiable. The OEA (supervisor level) specifies the size of a page as 
4 Kbytes. It is important to note that the VEA (user level) does not specify the page size. 
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5.1.4.1 Memory/Cache Access Modes 

The OEA defines the set of memory/cache access modes and the mechanism to implement 
these modes. Refer to Section 5.2.1, “Memory/Cache Access Attributes,’ for more 
information. However, the VEA specifies that at the user level, the operating system can be 
expected to provide the following attributes for each page of memory: 


¢ Write-through or write-back 

¢ Caching-inhibited or caching-allowed 

¢ Memory coherency required or memory coherency not required 
¢ Guarded or not guarded 


User-level programs specify the memory/cache access attributes through an operating 
system service. 


5.1.4.1.1 Pages Designated as Write-Through 

When a page is designated as write-through, store operations update the data in the cache 
and also update the data in main memory. The processor writes to the cache and through to 
main memory. Load operations use the data in the cache, if it is present. 


In write-back mode, the processor is only required to update data in the cache. The 
processor may (but is not required to) update main memory. Load and store operations use 
the data in the cache, if it is present. The data in main memory does not necessarily stay 
consistent with that same location’s data in the cache. Many implementations automatically 
update main memory in response to a memory access by another device (for example, a 
snoop hit). In addition, the debst and dcbf instructions can be used to explicitly force an 
update of main memory. 


The write-through attribute is meaningless for locations designated as caching-inhibited. 


5.1.4.1.2 Pages Designated as Caching-Inhibited 

When a page is designated as caching-inhibited, the processor bypasses the cache and 
performs load and store operations to main memory. When a page is designated as caching- 
allowed, the processor uses the cache and performs load and store operations to the cache 
or main memory depending on the other memory/cache access attributes for the page. 


It is important that all locations in a page are purged from the cache prior to changing the 
memory/cache access attribute for the page from caching-allowed to caching-inhibited. It 
is considered a programming error if a caching-inhibited memory location is found in the 
cache. Software must ensure that the location has not previously been brought into the 
cache, or, if it has, that it has been flushed from the cache. If the programming error occurs, 
the result of the access is boundedly undefined. 
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5.1.4.1.3 Pages Designated as Memory Coherency Required 

When a page is designated as memory coherency required, store operations to that location 
are serialized with all stores to that same location by all other processors that also access 
the location coherently.This can be implemented, for example, by an ownership protocol 
that allows at most one processor at a time to store to the location. Moreover, the current 
copy of a cache block that is in this mode may be copied to main storage any number of 
times, for example, by successive debst instructions. 


Coherency does not ensure that the result of a store by one processor is visible immediately 
to all other processors and mechanisms. Only after a program has executed the syne 
instruction are the previous storage accesses it executed guaranteed to have been performed 
with respect to all other processors and mechanisms. 


5.1.4.1.4 Pages Designated as Memory Coherency Not Required 

For a memory area that is configured such that coherency is not required, software must 
ensure that the data cache is consistent with main storage before changing the mode or 
allowing another device to access the area. 


Executing a debst or debf instruction specifying a cache block that is in this mode causes 
the block to be copied to main memory if and only if the processor modified the contents 
of a location in the block and the modified contents have not been written to main memory. 


In a single-cache system, correct coherent execution may likely not require memory 
coherency; therefore, using memory coherency not required mode improves performance. 


5.1.4.1.5 Pages Designated as Guarded 

The guarded attribute pertains to out-of-order execution. Refer to Section 5.2.1.5.3, “Out- 
of-Order Accesses to Guarded Memory,’ for more information about out-of-order 
execution. 


When a page is designated as guarded, instructions and data cannot be accessed out of 
order. Additionally, if separate store instructions access memory that is both caching- 
inhibited and guarded, the accesses are performed in the order specified by the program. 
When a page is designated as not guarded, out-of-order fetches and accesses are allowed. 


5.1.4.2 Coherency Precautions 

Mismatched memory/cache attributes cause coherency paradoxes in both single-processor 
and multiprocessor systems. When the memory/cache access attributes are changed, it is 
critical that the cache contents reflect the new attribute settings. For example, if a block or 
page that had allowed caching becomes caching-inhibited, the appropriate cache blocks 
should be flushed to leave no indication that caching had previously been allowed. 


Although coherency paradoxes are considered programming errors, specific 
implementations may attempt to handle the offending conditions and minimize the negative 
effects on memory coherency. Bus operations that are generated for specific instructions 
and state conditions are not defined by the architecture. 
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5.1.5 VEA Cache Management Instructions 


The VEA defines instructions for controlling both the instruction and data caches. For 
implementations that have a unified instruction/data cache, instruction cache control 
instructions are valid instructions, but may function differently. 


Note that any cache control instruction that generates an EA that corresponds to a direct- 
store segment (SR[T] = 1) is treated as a no-op. However, the direct-store facility is being 
phased out of the architecture and will not likely be supported in future devices. Thus, 
software should not depend on its effects. 


This section briefly describes the cache management instructions available to programs at 
the user privilege level. Additional descriptions of coding the VEA cache management 
instructions is provided in Chapter 4, “Addressing Modes and Instruction Set Summary,” 
and Chapter 8, “Instruction Set.” In the following instruction descriptions, the target is the 
cache block containing the byte addressed by the effective address. 


5.1.5.1 Data Cache Instructions 

Data caches and unified caches must be consistent with other caches (data or unified), 
memory, and I/O data transfers. To ensure consistency, aliased effective addresses (two 
effective addresses that map to the same physical address) must have the same page offset. 
Note that physical address is referred to as real address in the architecture specification. 


5.1.5.1.1. Data Cache Block Touch (dcbt) and 

Data Cache Block Touch for Store (dcbtst) Instructions 
These instructions provide a method for improving performance through the use of 
software-initiated prefetch hints. However, these instructions do not guarantee that a cache 
block will be fetched. 


A program uses the debt instruction to request a cache block fetch before it is needed by 
the program. The program can then use the data from the cache rather than fetching from 
main memory. 


The debtst instruction behaves similarly to the debt instruction. A program uses debtst to 
request a cache block fetch to guarantee that a subsequent store will be to a cached location. 


The processor does not invoke the exception handler for translation or protection violations 
caused by either of the touch instructions. Additionally, memory accesses caused by these 
instructions are not necessarily recorded in the page tables. If an access is recorded, then it 
is treated in a manner similar to that of a load from the addressed byte. Some 
implementations may not take any action based on the execution of these instructions, or 
they may prefetch the cache block corresponding to the EA into their cache. For 
information about the R and C bits, see Section 7.5.3, “Page History Recording.” 
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Both debt and debtst are provided for performance optimization. These instructions do not 
affect the correct execution of a program, regardless of whether they succeed (fetch the 
cache block) or fail (do not fetch the cache block). If the target block is not accessible to 
the program for loads, then no operation occurs. 


5.1.5.1.2 Data Cache Block Set to Zero (dcbz) Instruction 
The debz instruction clears a single cache block as follows: 


¢ If the target is in the data cache, all bytes of the cache block are cleared. 


¢ If the target is not in the data cache and the corresponding page is caching-allowed, 
the cache block is established in the data cache (without fetching the cache block 
from main memory), and all bytes of the cache block are cleared. 


¢ Ifthe target is designated as either caching-inhibited or write-through, then either all 
bytes in main memory that correspond to the addressed cache block are cleared, or 
the alignment exception handler is invoked. The exception handler should clear all 
the bytes in main memory that correspond to the addressed cache block. 


¢ If the target is designated as coherency required, and the cache block exists in the 
data cache(s) of any other processor(s), it is kept coherent in those caches. 


The dcbz instruction is treated as a store to the addressed byte with respect to address 
translation, protection, referenced and changed recording, and the ordering enforced by 
eieio or by the combination of caching-inhibited and guarded attributes for a page. 


Refer to Chapter 6, “Exceptions,” for more information about a possible delayed machine 
check exception that can occur by using debz when the operating system has set up an 
incorrect memory mapping. 


5.1.5.1.3 Data Cache Block Store (dcbst) Instruction 
The dcbst instruction permits the program to ensure that the latest version of the target 
cache block is in main memory. The debst instruction executes as follows: 
¢ Coherency required—TIf the target exists in the data cache(s) of any processor(s) and 
has been modified, the data is written to main memory. 


¢ Coherency not required—lIf the target exists in the data cache of the executing 
processor and has been modified, the data is written to main memory. 


The function of this instruction is independent of the write-through/write-back and 
caching-inhibited/caching-allowed attributes of the target. 


The memory access caused by a debst instruction is not necessarily recorded in the page 
tables. If the access is recorded, then it is treated as a load operation (not as a store 
operation). 
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5.1.5.1.4 Data Cache Block Flush (dcbf) Instruction 

The action taken depends on the memory/cache access mode associated with the target, and 
on the state of the cache block. The following list describes the action taken for the various 
cases: 


¢ Coherency required 


Unmodified cache block—Invalidates copies of the cache block in the data caches 
of all processors. 


Modified cache block—Copies the cache block to memory. Invalidates copies of the 
cache block in the data caches of all processors. 


Target block not in cache—If a modified copy of the cache block is in the data 
cache(s) of any processor(s), debf causes the modified cache block to be copied to 
memory and then invalidated. If unmodified copies are in the data caches of other 
processors, dcbf causes those copies to be invalidated. 


¢ Coherency not required 


Unmodified cache block—Invalidates the cache block in the executing processor's 
data cache. 


Modified cache block—Copies the data cache block to memory and then invalidates 
the cache block in the executing processor. 


Target block not in cache—No action is taken. 


The function of this instruction is independent of the write-through/write-back and 
caching-inhibited/caching-allowed attributes of the target. 


The memory access caused by a debf instruction is not necessarily recorded in the page 
tables. If the access is recorded, then it is treated as a load operation (not as a store 
operation). 


5.1.5.2 Instruction Cache Instructions 


Instruction caches, if they exist, are not required to be consistent with data caches, memory, 
or I/O data transfers. Software must use the appropriate cache management instructions to 
ensure that instruction caches are kept coherent when instructions are modified by the 
processor or by input data transfer. When a processor alters a memory location that may be 
contained in an instruction cache, software must ensure that updates to memory are visible 
to the instruction fetching mechanism. Although the instructions to enforce consistency 
vary among implementations, the following sequence for a uniprocessor system is typical: 


1. dcbst (update memory) 

2. sync (wait for update) 

3. icbi (invalidate copy in instruction cache) 
4. isyne (perform context synchronization) 
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Note that most operating systems will provide a system service for this function. These 
operations are necessary because the memory may be designated as write-back. Since 
instruction fetching may bypass the data cache, changes made to items in the data cache 
may not otherwise be reflected in memory until after the instruction fetch completes. 


For implementations used in multiprocessor systems, variations on this sequence may be 
recommended. For example, in a multiprocessor system with a unified instruction/data 
cache (at any level), if instructions are fetched without coherency being enforced, the 
preceding instruction sequence is inadequate. Because the icbi instruction does not 
invalidate blocks in a unified cache, a debf instruction should be used instead of a debst 
instruction for this case. 


5.1.5.2.1 Instruction Cache Block Invalidate Instruction (icbi) 
The icbi instruction executes as follows: 


* Coherency required 


If the target is in the instruction cache of any processor, the cache block is made 
invalid in all such processors, so that the next reference causes the cache block to be 
refetched. 


¢ Coherency not required 


If the target is in the instruction cache of the executing processor, the cache block is 
made invalid in the executing processor so that the next reference causes the cache 
block to be refetched. 


The icbi instruction is provided for use in processors with separate instruction and data 
caches. The effective address is computed, translated, and checked for protection violations 
as defined in Chapter 7, “Memory Management.” If the target block is not accessible to the 
program for loads, then a DSI exception occurs. 


The function of this instruction is independent of the write-through/write-back and 
caching-inhibited/caching-allowed attributes of the target. 


The memory access caused by an icbi instruction is not necessarily recorded in the page 
tables. If the access is recorded, then it is treated as a load operation. Implementations that 
have a unified cache treat the icbi instruction as a no-op except that they may invalidate the 
target cache block in the instruction caches of other processors (in coherency required 
mode). 


5.1.5.2.2 Instruction Synchronize Instruction (isync) 

The isyne instruction provides an ordering function for the effects of all instructions 
executed by a processor. Executing an isyne instruction ensures that all instructions 
preceding the isync instruction have completed before the isyne instruction completes, 
except that memory accesses caused by those instructions need not have been performed 
with respect to other processors and mechanisms. It also ensures that no subsequent 
instructions are initiated by the processor until after the isyne instruction completes. 
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Finally, it causes the processor to discard any prefetched instructions, with the effect that 
subsequent instructions will be fetched and executed in the context established by the 
instructions preceding the isync instruction. The isyne instruction has no effect on other 
processors or on their caches. 


5.2 The Operating Environment 


The OEA defines the mechanism for controlling the memory/cache access modes 
introduced in Section 5.1.4.1, “Memory/Cache Access Modes.” This section describes the 
cache-related aspects of the OEA including the memory/cache access attributes, out-of- 
order execution, direct-store interface considerations, and the debi instruction. The features 
of the OEA are accessible to supervisor-level applications only. The mechanism for 
controlling the virtual memory space is described in Chapter 7, “Memory Management.” 


The memory model of PowerPC processors provides the following features: 
¢ Flexibility to allow performance benefits of weakly-ordered memory access 
« A mechanism to maintain memory coherency among processors and between a 
processor and I/O devices controlled at the block and page level 
¢ Instructions that can be used to ensure a consistent memory state 


¢ Guaranteed processor access order 


The memory implementations in PowerPC systems can take advantage of the performance 
benefits of weak ordering of memory accesses between processors or between processors 
and other external devices without any additional complications. Memory coherency can 
be enforced externally by a snooping bus design, a centralized cache directory design, or 
other designs that can take advantage of the coherency features of PowerPC processors. 


Memory accesses performed by a single processor appear to complete sequentially from 
the view of the programming model but may complete out of order with respect to the 
ultimate destination in the memory hierarchy. Order is guaranteed at each level of the 
memory hierarchy for accesses to the same address from the same processor. The debst, 
debf, icbi, isync, sync, eieio, lwarx, and stwex. instructions allow the programmer to 
ensure a consistent memory state. 


5.2.1 Memory/Cache Access Attributes 
All instruction and data accesses are performed under the control of the four memory/cache 
access attributes: 

¢ Write-through (W attribute) 

¢ Caching-inhibited (I attribute) 

* Memory coherency (M attribute) 

¢ Guarded (G attribute) 
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These attributes are programmed in the PTEs and BATs by the operating system for each 
page and block respectively. The W and I attributes control how the processor performing 
an access uses its own cache. The M attribute ensures that coherency is maintained for all 
copies of the addressed memory location. When an access requires coherency, the processor 
performing the access must inform the coherency mechanisms throughout the system that 
the access requires memory coherency. The G attribute prevents out-of-order loading and 
prefetching from the addressed memory location. 


Note that the memory/cache access attributes are relevant only when an effective address is 
translated by the processor performing the access. Note also that not all combinations of 
settings of these bits is supported. The attributes are not saved along with data in the cache 
(for cacheable accesses), nor are they associated with subsequent accesses made by other 
processors. 


The operating system programs the memory/cache access attribute for each page or block 
as required. The WIMG attributes occupy four bits in the BAT registers for block address 
translation and in the PTEs for page address translation. The WIMG bits are programmed 
as follows: 


¢ The operating system uses the mtspr instruction to program the WIMG bits in the 
BAT registers for block address translation. The IBAT register pairs implement the 
W or G bits; however, attempting to set either bit in IBAT registers causes 
boundedly-undefined results. 


¢ The operating system writes the WIMG bits for each page into the PTEs in system 
memory as it sets up the page tables. 


Note that for data accesses performed in real addressing mode (MSR[DR] = 0), the WIMG 
bits are assumed to be Ob0011 (the data is write-back, caching is enabled, memory 
coherency is enforced, and memory is guarded). For instruction accesses performed in real 
addressing mode (MSR[IR] = 0), the WIMG bits are assumed to be O0b0001 (the data is 
write-back, caching is enabled, memory coherency is not enforced, and memory is 
guarded). 


5.2.1.1 Write-Through Attribute (W) 

When an access is designated as write-through (W = 1), if the data is in the cache, a store 
operation updates the cached copy of the data. In addition, the update is written to the 
memory location. The definition of the memory location to be written to (in addition to the 
cache) depends on the implementation of the memory system but can be illustrated by the 
following examples: 


* RAM—The store is sent to the RAM controller to be written into the target RAM. 


¢ I/O device—The store is sent to the memory-mapped I/O controller to be written to 
the target register or memory location. 


In systems with multilevel caching, the store must be written to at least a depth in the 
memory hierarchy that is seen by all processors and devices. 
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Multiple store instructions may be combined for write-through accesses except when the 
store instructions are separated by a sync or eieio instruction. A store operation to a memory 
location designated as write-through may cause any part of the cache block to be written 
back to main memory. 


Accesses that correspond to W = 0 are considered write-back. For this case, although the 
store operation is performed to the cache, the data is copied to memory only when a copy- 
back operation is required. Use of the write-back mode (W = Q) can improve overall 
performance for areas of the memory space that are seldom referenced by other processors 
or devices in the system. 


Accesses to the same memory location using two effective addresses for which the W bit 
setting differs meet the memory-coherency requirements if the accesses are performed by 
a single processor. If the accesses are performed by two or more processors, coherence is 
enforced by the hardware only if the write-through attribute is the same for all the accesses. 


5.2.1.2 Caching-Inhibited Attribute (I) 


If I = 1, the memory access is completed by referencing the location in main memory, 
bypassing the cache. During the access, the addressed location is not loaded into the cache 
nor is the location allocated in the cache. 


It is considered a programming error if a copy of the target location of an access to caching- 
inhibited memory is resident in the cache. Software must ensure that the location has not 
been previously loaded into the cache, or, if it has, that it has been flushed from the cache. 


Data accesses from more than one instruction may be combined for cache-inhibited 
operations, except when the accesses are separated by a sync instruction, or by an eieio 
instruction when the page or block is also designated as guarded. 


Instruction fetches, debz instructions, and load and store operations to the same memory 
location using two effective addresses for which the I bit setting differs must meet the 
requirement that a copy of the target location of an access to caching-inhibited memory not 
be in the cache. Violation of this requirement is considered a programming error; software 
must ensure that the location has not previously been brought into the cache or, if it has, 
that it has been flushed from the cache. If the programming error occurs, the result of the 
access is boundedly undefined. It is not considered a programming error if the target 
location of any other cache management instruction to caching-inhibited memory is in the 
cache. 
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5.2.1.3 Memory Coherency Attribute (M) 


This attribute is provided to allow improved performance in systems where hardware- 
enforced coherency is relatively slow, and software is able to enforce the required 
coherency. When M = 0, there are no requirements to enforce data coherency. When M = 1, 
the processor enforces data coherency. 


When the M attribute is set, and the access is performed to memory, there is a hardware 
indication to the rest of the system that the access is global. Other processors affected by 
the access must then respond to this global access. For example, in a snooping bus design, 
the processor may assert some type of global access signal. Other processors affected by 
the access respond and signal whether the data is being shared. If the data in another 
processor is modified, then the location is updated and the access is retried. 


Because instruction memory does not have to be coherent with data memory, some 
implementations may ignore the M attribute for instruction accesses. In a single-processor 
(or single-cache) system, performance might be improved by designating all pages as 
memory coherency not required. 


Accesses to the same memory location using two effective addresses for which the M bit 
settings differ may require explicit software synchronization before accessing the location 
with M = 1 if the location has previously been accessed with M = 0. Any such requirement 
is system-dependent. For example, no software synchronization may be required for 
systems that use bus snooping. In some directory-based systems, software may be required 
to execute debf instructions on each processor to flush all storage locations accessed with 
M = 0 before accessing those locations with M = 1. 


5.2.1.4 W, 1, and M Bit Combinations 

Table 5-1 summarizes the six combinations of the WIM bits supported by the OEA. The 
combinations where WIM = 11x are not supported. Note that either a zero or one setting 
for the G bit is allowed for each of these WIM bit combinations. 


Table 5-1. Combinations of W, I, and M Bits 


The processor may cache data (or instructions). 
A load or store operation whose target hits in the cache may use that entry in the cache. 
The processor does not need to enforce memory coherency for accesses it initiates. 


Data (or instructions) may be cached. 
A load or store operation whose target hits in the cache may use that entry in the cache. 
The processor enforces memory coherency for accesses it initiates. 


Caching is inhibited. 
The access is performed to memory, completely bypassing the cache. 
The processor does not need to enforce memory coherency for accesses it initiates. 


Caching is inhibited. 
The access is performed to memory, completely bypassing the cache. 
The processor enforces memory coherency for accesses it initiates. 
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Table 5-1. Combinations of W, I, and M Bits (Continued) 


Data (or instructions) may be cached. 

A load operation whose target hits in the cache may use that entry in the cache. 

Store operations are written to memory. The target location of the store may be cached and is 
updated on a hit. 


The processor does not need to enforce memory coherency for accesses it initiates. 


Data (or instructions) may be cached. 

A load operation whose target hits in the cache may use that entry in the cache. 

Store operations are written to memory. The target location of the store may be cached and is 
updated on a hit. 

The processor enforces memory coherency for accesses it initiates. 





5.2.1.5 The Guarded Attribute (G) 


When the guarded bit is set, the memory area (block or page) is designated as guarded. This 
setting can be used to protect certain memory areas from read accesses made by the 
processor that are not dictated directly by the program. If there are areas of physical 
memory that are not fully populated (in other words, there are holes in the physical memory 
map within this area), this setting can protect the system from undesired accesses caused 
by out-of-order load operations or instruction prefetches that could lead to the generation 
of the machine check exception. Also, the guarded bit can be used to prevent out-of-order 
(speculative) load operations or prefetches from occurring to certain peripheral devices that 
produce undesired results when accessed in this way. 


5.2.1.5.1 Performing Operations Out of Order 
An operation is said to be performed in-order if it is guaranteed to be required by the 
sequential execution model. Any other operation is said to be performed out of order. 


Operations are performed out of order by the hardware on the expectation that the results 
will be needed by an instruction that will be required by the sequential execution model. 
Whether the results are really needed is contingent on everything that might divert the 
control flow away from the instruction, such as branch, trap, system call, and rfi 
instructions, and exceptions, and on everything that might change the context in which the 
instruction is executed. 
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Typically, the hardware performs operations out of order when it has resources that would 
otherwise be idle, so the operation incurs little or no cost. If subsequent events such as 
branches or exceptions indicate that the operation would not have been performed in the 
sequential execution model, the processor abandons any results of the operation (except as 
described below). 


Most operations can be performed out of order, as long as the machine appears to follow 
the sequential execution model. Certain out-of-order operations are restricted, as follows. 


e Stores 


A store instruction may not be executed out of order in a manner such that the 
alteration of the target location can be observed by other processors or mechanisms. 


e Accessing guarded memory 


The restrictions for this case are given in Section 5.2.1.5.3, “Out-of-Order Accesses 
to Guarded Memory.” 


No error of any kind other than a machine check exception may be reported due to an 
operation that is performed out of order, until such time as it is known that the operation is 
required by the sequential execution model. The only other permitted side effects (other 
than machine check) of performing an operation out of order are the following: 


¢ Referenced and changed bits may be set as described in Section 7.2.5, “Page History 
Information.” 


* Nonguarded memory locations that could be fetched into a cache by in-order 
execution may be fetched out of order into that cache. 


5.2.1.5.2 Guarded Memory 

Memory is said to be well behaved if the corresponding physical memory exists and is not 
defective, and if the effects of a single access to it are indistinguishable from the effects of 
multiple identical accesses to it. Data and instructions can be fetched out of order from 
well-behaved memory without causing undesired side effects. 


Memory is said to be guarded if either (a) the G bit is 1 in the relevant PTE or DBAT 
register, or (b) the processor is in real addressing mode (MSR[IR] = 0 or MSR[DR] = 0 for 
instruction fetches or data accesses respectively). In case (b), all of memory is guarded for 
the corresponding accesses. In general, memory that is not well-behaved should be 
guarded. Because such memory may represent an I/O device or may include locations that 
do not exist, an out-of-order access to such memory may cause an I/O device to perform 
incorrect operations or may result in a machine check. 


Note that if separate store instructions access memory that is both caching-inhibited and 
guarded, the accesses are performed in the order specified by the program. If an aligned, 
elementary load or store to caching-inhibited, guarded memory has accessed main memory 
and an external, decrementer, or imprecise-mode floating-point enabled exception is 
pending, the load or store is completed before the exception is taken. 
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5.2.1.5.3 Out-of-Order Accesses to Guarded Memory 
The circumstances in which guarded memory may be accessed out of order are as follows: 


Load instruction 


If a copy of the target location is in a cache, the location may be accessed in the 
cache or in main memory. 


Instruction fetch 


In real addressing mode (MSR[IR] = 0), an instruction may be fetched if any of the 
following conditions is met: 


— The instruction is in a cache. In this case, it may be fetched from that cache. 


— The instruction is in the same physical page as an instruction that is required by 
the sequential execution model or is in the physical page immediately following 
such a page. 


If MSR[IR] = 1, instructions may not be fetched from either no-execute segments or 
guarded memory. If the effective address of the current instruction is mapped to 
either of these kinds of memory when MSR[IR] = 1, an ISI exception is generated. 
However, it is permissible for an instruction from either of these kinds of memory 
to be in the instruction cache if it was fetched into that cache when its effective 
address was mapped to some other kind of memory. Thus, for example, the 
operating system can access an application's instruction segments as no-execute 
without having to invalidate them in the instruction cache. 


Additionally, instructions are not fetched from direct-store segments (only applies 

when MSR[IR] = 1). If an instruction fetch is attempted from a direct-store segment, 
an ISI exception is generated. Note that the direct-store facility is being phased out 
of the architecture and will not likely be supported in future devices. Thus, software 
should not depend on its effects. 


Note that software should ensure that only well-behaved memory is loaded into a cache, 
either by marking as caching-inhibited (and guarded) all memory that may not be well- 
behaved, or by marking such memory caching-allowed (and guarded) and referring only to 
cache blocks that are well-behaved. 


If a physical page contains instructions that will be executed in real addressing mode 
(MSR[IR] = 0), software should ensure that this physical page and the next physical page 
contain only well-behaved memory. 
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5.2.2 I/O Interface Considerations 


The PowerPC architecture defines two mechanisms for accessing I/O: 


¢ Memory-mapped I/O interface operations. SR[T] = 0. These operations are 
considered to address memory space and are therefore subject to the same coherency 
control as memory accesses. Depending on the specific I/O interface, the 
memory/cache access attributes (WIMG) and the degree of access ordering 
(requiring eieio or sync instructions) need to be considered. This is the 
recommended way of accessing I/O. 


¢ Direct-store segment operations. SR[T] = 1. These operations are considered to 
address the noncoherent and noncacheable direct-store segment space; therefore, 
hardware need not maintain coherency for these operations, and the cache is 
bypassed completely. Although the architecture defines this direct-store 
functionality, it is being phased out of the architecture and will not likely be 
supported in future devices. Thus, its use is discouraged, and new software should 
not use it or depend on its effects. 


5.2.3 OEA Cache Management Instruction— 
Data Cache Block Invalidate (dcbi) 


As described in Section 5.1.5, “WEA Cache Management Instructions,” the VEA defines 
instructions for controlling both the instruction and data caches, The OEA defines one 
instruction, the data cache block invalidate (debi) instruction, for controlling the data 
cache. This section briefly describes the cache management instruction available to 
programs at the supervisor privilege level. Additional descriptions of coding the debi 
instruction are provided in Chapter 4, “Addressing Modes and Instruction Set Summary,” 
and Chapter 8, “Instruction Set.” In the following description, the target is the cache block 
containing the byte addressed by the effective address. 


Any cache management instruction that generates an EA that corresponds to a direct-store 
segment (SR[T] = 1) is treated as a no-op. However, note that the direct-store facility is 
being phased out of the architecture and will not likely be supported in future devices. Thus, 
software should not depend on its effects. 


The action taken depends on the memory/cache access mode associated with the target, and 
on the state of the cache block. The following list describes the action taken for the various 
cases: 


* Coherency required 


Unmodified cache block—Invalidates copies of the cache block in the data caches 
of all processors. 


Modified cache block—Invalidates copies of the cache block in the data caches of 
all processors. (Discards the modified data in the cache block.) 
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Target block not in cache—If copies of the target are in the data caches of other 
processors, dcbi causes those copies to be invalidated, regardless of whether the data 
is modified or unmodified. 


¢ Coherency not required 


Unmodified cache block—Invalidates the cache block in the executing processor's 
data cache. 


Modified cache block—lInvalidates the cache block in the executing processor's data 
cache. (Discards the modified data in the cache block.) 


Target block not in cache—No action is taken. 


The processor treats the debi instruction as a store to the addressed byte with respect to 
address translation and protection. It is not necessary to set the referenced and changed bits. 


The function of this instruction is independent of the write-through/write-back and 
caching-inhibited/caching-allowed attributes of the target. To ensure coherency, aliased 
effective addresses (two effective addresses that map to the same physical address) must 
have the same page offset. 
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Chapter 6 
Exceptions 


The operating environment architecture (OEA) portion of the PowerPC architecture defines 
the mechanism by which PowerPC processors implement exceptions (referred to as 
interrupts in the architecture specification). Exception conditions may be defined at other 
levels of the architecture. For example, the user instruction set architecture (UISA) defines 
conditions that may cause floating-point exceptions; the OEA defines the mechanism by 
which the exception is taken. 


The PowerPC exception mechanism allows the processor to change to supervisor state as a 
result of external signals, errors, or unusual conditions arising in the execution of 
instructions. When exceptions occur, information about the state of the processor is saved 
to certain registers and the processor begins execution at an address (exception vector) 
predetermined for each exception. Processing of exceptions begins in supervisor mode. 


Although multiple exception conditions can map to a single exception vector, a more 
specific condition may be determined by examining a register associated with the 
exception—for example, the DSISR and the floating-point status and control register 
(FPSCR). Additionally, certain exception conditions can be explicitly enabled or disabled 
by software. 


The PowerPC architecture requires that exceptions be taken in program order; therefore, 
although a particular implementation may recognize exception conditions out of order, they 
are handled strictly in order with respect to the instruction stream. When an instruction- 
caused exception is recognized, any unexecuted instructions that appear earlier in the 
instruction stream, including any that have not yet entered the execute state, are required to 
complete before the exception is taken. For example, if a single instruction encounters 
multiple exception conditions, those exceptions are taken and handled sequentially. 
Likewise, exceptions that are asynchronous and precise are recognized when they occur, 
but are not handled until all instructions currently in the execute stage successfully 
complete execution and report their results. 


Note that exceptions can occur while an exception handler routine is executing, and 
multiple exceptions can become nested. It is up to the exception handler to save the 
appropriate machine state if it is desired to allow control to ultimately return to the 
excepting program. 
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In many cases, after the exception handler handles an exception, there is an attempt to 
execute the instruction that caused the exception. Instruction execution continues until the 
next exception condition is encountered. This method of recognizing and handling 
exception conditions sequentially guarantees that the machine state is recoverable and 
processing can resume without losing instruction results. 


To prevent the loss of state information, exception handlers must save the information 
stored in SRRO and SRR1I soon after the exception is taken to prevent this information from 
being lost due to another exception being taken. 


In this chapter, the following terminology is used to describe the various stages of exception 
processing: 


Recognition Exception recognition occurs when the condition that can cause an 
exception is identified by the processor. 


Taken An exception is said to be taken when control of instruction 
execution is passed to the exception handler; that is, the context is 
saved and the instruction at the appropriate vector offset is fetched 
and the exception handler routine is begun in supervisor mode. 


Handling Exception handling is performed by the software linked to the 
appropriate vector offset. Exception handling is begun in supervisor 
mode (referred to as privileged state in the architecture 
specification). 
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6.1 Exception Classes 


As specified by the PowerPC architecture, all exceptions can be described as either precise 
or imprecise and either synchronous or asynchronous. Asynchronous exceptions are caused 
by events external to the processor’s execution; synchronous exceptions are caused by 
instructions. 


The PowerPC exception types are shown in Table 6-1. 


Table 6-1. PowerPC Exception Classifications 
ee 
Asynchronous/nonmaskable Machine Check 
System Reset 
Asynchronous/maskable External interrupt 
Decrementer 


Synchronous/imprecise Instruction-caused imprecise exceptions 
(Floating-point imprecise exceptions) 


Synchronous/precise Instruction-caused exceptions, excluding floating- 
point imprecise exceptions 





Exceptions, their offsets, and conditions that cause them, are summarized in Table 6-2. The 
exception vectors described in the table correspond to physical address locations, 
depending on the value of MSR[IP]. Refer to Section 7.2.1.2, “Predefined Physical 
Memory Locations,” for a complete list of the predefined physical memory areas. 
Remaining sections in this chapter provide more complete descriptions of the exceptions 
and of the conditions that cause them. 
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Table 6-2. Exceptions and Conditions—Overview 


Exception | Vector Offset Fi og 


System 00100 The causes of system reset exceptions are implementation-dependent. If the conditions 

reset that cause the exception also cause the processor state to be corrupted such that the 
contents of SRRO and SRR1 are no longer valid or such that other processor resources 
are so corrupted that the processor cannot reliably resume execution, the copy of the RI 
bit copied from the MSR to SRR1 is cleared. 


Machine 00200 The causes for machine check exceptions are implementation-dependent, but typically 

check these causes are related to conditions such as bus parity errors or attempting to access 
an invalid physical address. Typically, these exceptions are triggered by an input signal to 
the processor. Note that not all processors provide the same level of error checking. 
The machine check exception is disabled when MSR[ME] = 0. If a machine check 
exception condition exists and the ME bit is cleared, the processor goes into the 
checkstop state. 
If the conditions that cause the exception also cause the processor state to be corrupted 
such that the contents of SRRO and SRR1 are no longer valid or such that other 
processor resources are so corrupted that the processor cannot reliably resume 
execution, the copy of the RI bit written from the MSR to SRR1 is cleared. 
(Note that physical address is referred to as real address in the architecture 
specification.) 


DSI 00300 A DSI exception occurs when a data memory access cannot be performed for any of the 
reasons described in Section 6.4.3, “DSI Exception (0x00300).” Such accesses can be 
generated by load/store instructions, certain memory control instructions, and certain 
cache control instructions. 


ISI 00400 An ISI exception occurs when an instruction fetch cannot be performed for a variety of 
reasons described in Section 6.4.4, “ISI Exception (0x00400).” 


An external interrupt is generated only when an external interrupt is pending (typically 
signalled by a signal defined by the implementation) and the interrupt is enabled 
(MSR[EE] = 1). 


Alignment 00600 An alignment exception may occur when the processor cannot perform a memory 
access for reasons described in Section 6.4.6, “Alignment Exception (0x00600).” 
Note that an implementation is allowed to perform the operation correctly and not cause 
an alignment exception. 
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Table 6-2. Exceptions and Conditions—Overview (Continued) 


Exception | Vector Offset ‘ on 


Program 00700 A program exception is caused by one of the following exception conditions, which 
correspond to bit settings in SRR1 and arise during execution of an instruction: 

* Floating-point enabled exception—A floating-point enabled exception condition is 
generated when MSR[FEO-FE1] # 00 and FPSCR[FEX] is set. The settings of FEO 
and FE1 are described in Table 6-3. 

FPSCRI[FEX] is set by the execution of a floating-point instruction that causes an 
enabled exception or by the execution of a Move to FPSCR instruction that sets both 
an exception condition bit and its corresponding enable bit in the FPSCR. These 
exceptions are described in Section 3.3.6, “Floating-Point Program Exceptions.” 
Illegal instruction—An illegal instruction program exception is generated when 
execution of an instruction is attempted with an illegal opcode or illegal combination of 
opcode and extended opcode fields or when execution of an optional instruction not 
provided in the specific implementation is attempted (these do not include those 
optional instructions that are treated as no-ops). The PowerPC instruction set is 
described in Chapter 4, “Addressing Modes and Instruction Set Summary.” See 
Section 6.4.7, “Program Exception (0x00700),” for a complete list of causes for an 
illegal instruction program exception. 

Privileged instruction—A privileged instruction type program exception is generated 
when the execution of a privileged instruction is attempted and the MSR user 
privilege bit, MSR[PR], is set. This exception is also generated for mtspr or mfspr 
with an invalid SPR field if spr[0] = 1 and MSR[PR] = 1. 

Trap—A trap type program exception is generated when any of the conditions 
specified in a trap instruction is met. 

For more information, refer to Section 6.4.7, “Program Exception (0x00700).” 


Floating- A floating-point unavailable exception is caused by an attempt to execute a floating-point 
point instruction (including floating-point load, store, and move instructions) when the floating- 
unavailable point available bit is cleared, MSR[FP] = 0. 


Decrementer | 00900 The decrementer interrupt exception is taken if the exception is enabled (MSR[EE] = 1), 
and it is pending. The exception is created when the most-significant bit of the 
decrementer changes from 0 to 1. If itis not enabled, the exception remains pending until 
itis taken. 

Reserved 00A00 This is reserved for implementation-specific exceptions. For example, the 601 uses this 
vector offset for direct-store exceptions. 


System call |00C00 A system call exception occurs when a System Call (sc) instruction is executed. 


Trace 00D00 Implementation of the trace exception is optional. If implemented, it occurs if either the 
MSRI[SE] = 1 and almost any instruction successfully completed or MSR[BE] = 1 anda 
branch instruction is completed. See Section 6.4.11, “Trace Exception (0x00D00),” for 
more information. 


Floating- Implementation of the floating-point assist exception is optional. This exception can be 
point assist used to provide software assistance for infrequent and complex floating-point operations 
such as denormalization. 


Reserved 01000-02FFF | This is reserved for implementation-specific purposes. May be used for implementation- 
specific exception vectors or other uses. 





Chapter 6. Exceptions 6-5 


6.1.1 Precise Exceptions 


When any precise exceptions occur, SRRO is set to point to an instruction such that all prior 
instructions in the instruction stream have completed execution and no subsequent 
instruction has begun execution. However, depending on the exception type, the instruction 
addressed by SRRO may not have completed execution. 


When an exception occurs, instruction dispatch (the issuance of instructions by the 
instruction fetch unit to any instruction execution mechanism) is halted and the following 
synchronization is performed: 


1. The exception mechanism waits for all previous instructions in the instruction 
stream to complete to a point where they report all exceptions they will cause. 


2. The processor ensures that all previous instructions in the instruction stream 
complete in the context in which they began execution. 


3. The exception mechanism implemented in hardware and the software handler is 
responsible for saving and restoring the processor state. 


The synchronization described conforms to the requirements for context synchronization. 
A complete description of context synchronization is described in the following section. 


6.1.2 Synchronization 


The synchronization described in this section refers to the state of activities within the 
processor that performs the synchronization. 


6.1.2.1 Context Synchronization 


An instruction or event is context synchronizing if it satisfies all the requirements listed 
below. Such instructions and events are collectively called context-synchronizing 
operations. Examples of context-synchronizing operations include the se and rfi 
instructions and most exceptions. A context-synchronizing operation has the following 
characteristics: 


1. The operation causes instruction dispatching (the issuance of instructions by the 
instruction fetch mechanism to any instruction execution mechanism) to be halted. 


2. The operation is not initiated or, in the case of isync, does not complete, until all 
instructions in execution have completed to a point at which they have reported all 
exceptions they will cause. 


If a prior memory access instruction causes one or more direct-store interface error 
exceptions, the results are guaranteed to be determined before this instruction is 
executed. However, note that the direct-store facility is being phased out of the 
architecture and will not likely be supported in future devices. 
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3. Instructions that precede the operation complete execution in the context (for 
example, the privilege, translation mode, and memory protection) in which they 
wete initiated. 


4. If the operation either directly causes an exception (for example, the se instruction 
causes a system call exception) or is an exception, the operation is not initiated until 
no exception exists having higher priority than the exception associated with the 
context-synchronizing operation. 


A context-synchronizing operation is necessarily execution synchronizing. Unlike the syne 
instruction, a context-synchronizing operation need not wait for memory-related operations 
to complete on other processors, or for referenced and changed bits in the page table to be 
updated. 


6.1.2.2 Execution Synchronization 

An instruction is execution synchronizing if it satisfies the conditions of the first two items 
described above for context synchronization. The sync instruction is treated like isyne with 
respect to the second item described above (that is, the conditions described in the second 
item apply to the completion of sync). The syne and mtmsr instructions are examples of 
execution-synchronizing instructions. 


All context-synchronizing instructions are execution-synchronizing. Unlike a context- 
synchronizing operation, an execution-synchronizing instruction need not ensure that the 
subsequent instructions execute in the context established by that instruction. This new 
context becomes effective sometime after the execution-synchronizing instruction 
completes and before or at a subsequent context-synchronizing operation. 


6.1.2.3 Synchronous/Precise Exceptions 
When instruction execution causes a precise exception, the following conditions exist at the 
exception point: 
¢ Depending on the type of exception, SRRO addresses either the instruction causing 
the exception or the immediately following instruction. The instruction addressed 
can be determined from the exception type and status bits, which are defined in the 
description of each exception. 


e Allinstructions that precede the excepting instruction complete before the exception 
is processed. However, some memory accesses generated by these preceding 
instructions may not have been performed with respect to all other processors or 
system devices. 


¢ The instruction causing the exception may not have begun execution, may have 
partially completed, or may have completed, depending on the exception type. 
Handling of partially executed instructions is described in Section 6.1.4, “Partially 
Executed Instructions.” 


e Architecturally, no subsequent instruction has begun execution. 
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While instruction parallelism allows the possibility of multiple instructions reporting 
exceptions during the same cycle, they are handled one at a time in program order. 
Exception priorities are described in Section 6.1.5, “Exception Priorities.” 


6.1.2.4 Asynchronous Exceptions 

There are four asynchronous exceptions—system reset and machine check, which are 
nonmaskable and highest-priority exceptions, and external interrupt and decrementer 
exceptions which are maskable and low-priority. These two types of asynchronous 
exceptions are discussed separately. 


6.1.2.4.1 System Reset and Machine Check Exceptions 

System reset and machine check exceptions have the highest priority and can occur while 
other exceptions are being processed. Note that nonmaskable, asynchronous exceptions are 
never delayed; therefore, if two of these exceptions occur in immediate succession, the state 
information saved by the first exception may be overwritten when the subsequent exception 
occurs. Note that these exceptions are context-synchronizing if they are recoverable 
(MSR[RI] is copied from the MSR to SRR1 if the exception does not cause loss of state.) 
If the RI bit is clear (nonrecoverable), the exception is context-synchronizing only with 
respect to subsequent instructions. 


These exceptions cannot be masked by using the MSR[EE] bit. However, if the machine 
check enable bit, MSR[ME], is cleared and a machine check exception condition occurs, 
the processor goes directly into checkstop state as the result of the exception condition. 
When one of these exceptions occur, the following conditions exist at the exception point: 


¢ For system reset exceptions, SRRO addresses the instruction that would have 
attempted to execute next if the exception had not occurred. 


e For machine check exceptions, SRRO holds either an instruction that would have 
completed or some instruction following it that would have completed if the 
exception had not occurred. 


e An exception is generated such that all instructions preceding the instruction 
addressed by SRRO appear to have completed with respect to the executing 
processor. 


Note that a bit in the MSR (MSR[RI]) indicates whether enough of the machine state was 
saved to allow the processor to resume processing. 
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6.1.2.4.2 External Interrupt and Decrementer Exceptions 


For the external interrupt and decrementer exceptions, the following conditions exist at the 
exception point (assuming these exceptions are enabled (MSR[EE] bit is set)): 


¢ Allinstructions issued before the exception is taken and any instructions that 
precede those instructions in the instruction stream appear to have completed before 
the exception is processed. 


¢ No subsequent instructions in the instruction stream have begun execution. 


¢ SRRO addresses the instruction that would have been executed had the exception not 
occurred. 


That is, these exceptions are context-synchronizing. The external interrupt and decrementer 
exceptions are maskable. When the machine state register external interrupt enable bit is 
cleared (MSR[EE] = 0), these exception conditions are not recognized until the EE bit is 
set. MSR[EE] is cleared automatically when an exception is taken, to delay recognition of 
subsequent exception conditions. No two precise exceptions can be recognized 
simultaneously. Exception handling does not begin until all currently executing instructions 
complete and any synchronous, precise exceptions caused by those instructions have been 
handled. Exception priorities are described in Section 6.1.5, “Exception Priorities.” 


6.1.3 Imprecise Exceptions 


The PowerPC architecture defines one imprecise exception, the imprecise floating-point 
enabled exception. This is implemented as one of the conditions that can cause a program 
exception. 


6.1.3.1 Imprecise Exception Status Description 


When the execution of an instruction causes an imprecise exception, SRRO contains 
information related to the address of the excepting instruction as follows: 


¢ SRRO contains the address of either the instruction that caused the exception or of 
some instruction following that instruction. 


¢ The exception is generated such that all instructions preceding the instruction 
addressed by SRRO have completed with respect to the processor. 


¢ If the imprecise exception is caused by the context-synchronizing mechanism (due 
to an instruction that caused another exception—for example, an alignment or DSI 
exception), then SRRO contains the address of the instruction that caused the 
exception, and that instruction may have been partially executed (refer to 
Section 6.1.4, “Partially Executed Instructions”). 


¢ Ifthe imprecise exception is caused by an execution-synchronizing instruction other 
than sync or isync, SRRO addresses the instruction causing the exception. 
Additionally, besides causing the exception, that instruction is considered not to 
have begun execution. If the exception is caused by the sync or isync instruction, 
SRRO may address either the sync or isync instruction, or the following instruction. 
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If the imprecise exception is not forced by either the context-synchronizing 
mechanism or the execution-synchronizing mechanism, the instruction addressed by 
SRRO is considered not to have begun execution if it is not the instruction that caused 
the exception. 


When an imprecise exception occurs, no instruction following the instruction 
addressed by SRRO is considered to have begun execution. 


6.1.3.2 Recoverability of Imprecise Floating-Point Exceptions 

The enabled IEEE floating-point exception mode bits in the MSR (FEO and FE1) together 
define whether IEEE floating-point exceptions are handled precisely, imprecisely, or 
whether they are taken at all. The possible settings are shown in Table 6-3. For further 
details, see Section 3.3.6, “Floating-Point Program Exceptions.” 


Table 6-3. IEEE Floating-Point Program Exception Mode Bits 


Floating-point exceptions ignored 


HY Floating-point imprecise nonrecoverable 
| 1 [0 | Floating-point imprecise recoverable 
Floating-point precise mode 





As shown in the table, the imprecise floating-point enabled exception has two 
modes—nonrecoverable and recoverable. These modes are specified by setting the 
MSR[FEO] and MSR[FE1] bits and are described as follows: 


Imprecise nonrecoverable floating-point enabled mode. MSR[FEO] = 0 
MSR[FE1] = 1. When an exception occurs, the exception handler is invoked at some 
point at or beyond the instruction that caused the exception. It may not be possible 
to identify the excepting instruction or the data that caused the exception. Results 
from the excepting instruction may have been used by or affected subsequent 
instructions executed before the exception handler was invoked. 


Imprecise recoverable floating-point enabled mode. MSR[FEO] = 1; MSR[FE1] = 0. 
When an exception occurs, the floating-point enabled exception handler is invoked 
at some point at or beyond the instruction that caused the exception. Sufficient 
information is provided to the exception handler that it can identify the excepting 
instruction and correct any faulty results. In this mode, no incorrect results caused 
by the excepting instruction have been used by or affected subsequent instructions 
that are executed before the exception handler is invoked. 


Although these exceptions are maskable with these bits, they differ from other maskable 
exceptions in that the masking is usually controlled by the application program rather than 
by the operating system. 
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6.1.4 Partially Executed Instructions 


The architecture permits certain instructions to be partially executed when an alignment 
exception or DSI exception occurs, or an imprecise floating-point exception is forced by an 
instruction that causes an alignment or DSI exception. They are as follows: 


¢ Load multiple/string instructions that cause an alignment or DSI exception—Some 
registers in the range of registers to be loaded may have been loaded. 


¢ Store multiple/string instructions that cause an alignment or DSI exception—Some 
bytes in the addressed memory range may have been updated. 


¢ Non-multiple/string store instructions that cause an alignment or DSI 
exception—Some bytes just before the boundary may have been updated. If the 
instruction normally alters CRO (stwex.), CRO is set to an undefined value. For 
instructions that perform register updates, the update register (rA) is not altered. 


¢ Floating-point load instructions that cause an alignment or DSI exception—The 
target register may be altered. For update forms, the update register (rA) is not 
altered. 


¢ A load or store to a direct-store segment that causes a DSI exception due to a direct- 
store interface error exception—Some of the associated address/data transfers may 
not have been initiated. All initiated transfers are completed before the exception is 
reported, and the transfers that have not been initiated are aborted. Thus the 
instruction completes before the DSI exception occurs. However, note that the 
direct-store facility is being phased out of the architecture and will not likely be 
supported in future devices. 


In the cases above, the number of registers and the amount of memory altered are 
implementation-, instruction-, and boundary-dependent. However, memory protection is 
not violated. Furthermore, if some of the data accessed is in a direct-store segment and the 
instruction is not supported for use in such memory space, the locations in the direct-store 
segment are not accessed. Again, note that the direct-store facility is being phased out of 
the architecture and will not likely be supported in future devices. 


Partial execution is not allowed when integer load operations (except multiple/string 
operations) cause an alignment or DSI exception. The target register is not altered. For 
update forms of the integer load instructions, the update register (rA) is not altered. 
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6.1.5 Exception Priorities 


Exceptions are roughly prioritized by exception class, as follows: 


1. Nonmaskable, asynchronous exceptions have priority over all other 
exceptions—system reset and machine check exceptions (although the machine 
check exception condition can be disabled so that the condition causes the processor 
to go directly into the checkstop state). These two types of exceptions in this class 
cannot be delayed by exceptions in other classes, and do not wait for the completion 
of any precise exception handling. 


2. Synchronous, precise exceptions are caused by instructions and are taken in strict 
program order. 


3. If an imprecise exception exists (the instruction that caused the exception has been 
completed and is required by the sequential execution model), exceptions signaled 
by instructions subsequent to the instruction that caused the exception are not 
permitted to change the architectural state of the processor. The exception causes an 
imprecise program exception unless a machine check or system reset exception is 
pending. 


4. Maskable asynchronous exceptions (external interrupt and decrementer exceptions) 
have lowest priority. 


The exceptions are listed in Table 6-4 in order of highest to lowest priority. 


Table 6-4. Exception Priorities 


Exception a, 


Nonmaskable, System reset—The system reset exception has the highest priority of all exceptions. If this 

asynchronous exception exists, the exception mechanism ignores all other exceptions and generates a 
system reset exception. When the system reset exception is generated, previously issued 
instructions can no longer generate exception conditions that cause a nonmaskable 
exception. 


Machine check—The machine check exception is the second-highest priority exception. If 
this exception occurs, the exception mechanism ignores all other exceptions (except reset) 
and generates a machine check exception.When the machine check exception is 
generated, previously issued instructions can no longer generate exception conditions that 
cause a nonmaskable exception. 
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Table 6-4. Exception Priorities (Continued) 


Exception 
Bee eee 


Synchronous, 
precise 


Imprecise 
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Instruction dependent— When an instruction causes an exception, the exception 
mechanism waits for any instructions prior to the excepting instruction in the instruction 
stream to complete. Any exceptions caused by these instructions are handled first. It then 
generates the appropriate exception if no higher priority exception exists when the 
exception is to be generated. 
Note that a single instruction can cause multiple exceptions. When this occurs, those 
exceptions are ordered in priority as indicated in the following: 
A. Integer loads and stores 
a. Alignment 
b. DSI 
c. Trace (if implemented) 
B. Floating-point loads and stores 
a. Floating-point unavailable 
b. Alignment 
c. DSI 
d. Trace (if implemented) 
C. Other floating-point instructions 
a. Floating-point unavailable 
b. Program—Precise-mode floating-point enabled exception 
c. Floating-point assist (if implemented) 
d. Trace (if implemented) 
D.rfi and mtmsr 
a. Program—Privileged Instruction 
b. Program—Precise-mode floating-point enabled exception 
c. Trace (if implemented), for mtmsr only 
If precise-mode IEEE floating-point enabled exceptions are enabled and the 
FPSCR[FEX] bit is set, a program exception occurs no later than the next 
synchronizing event. 
E. Other instructions 
a. These exceptions are mutually exclusive and have the same priority: 
—Program: Trap 
— System call (sc) 
—Program: Privileged Instruction 
—Program: Illegal Instruction 
b. Trace (if implemented) 
F. ISI exception 
The ISI exception has the lowest priority in this category. It is only recognized when all 
instructions prior to the instruction causing this exception appear to have completed and 
that instruction is to be executed. The priority of this exception is specified for 
completeness and to ensure that it is not given more favorable treatment. An 
implementation can treat this exception as though it had a lower priority. 


Program imprecise floating-point mode enabled exceptions—When this exception occurs, 
the exception handler is invoked at or beyond the floating-point instruction that caused the 
exception. The PowerPC architecture supports recoverable and nonrecoverable imprecise 
modes, which are enabled by setting MSR[FEO] + MSR[FE1]. For more information see, 
Section 6.1.3, “Imprecise Exceptions.” 
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Table 6-4. Exception Priorities (Continued) 


Exception Bs 


Maskable, External interrupt—The external interrupt mechanism waits for instructions currently or 

asynchronous previously dispatched to complete execution. After all such instructions are completed, and 
any exceptions caused by those instructions have been handled, the exception mechanism 
generates this exception if no higher priority exception exists. This exception is enabled 


only if MSR[EE] is currently set. If EE is zero when the exception is detected, it is delayed 
until the bit is set. 


Decrementer—This exception is the lowest priority exception. When this exception is 
created, the exception mechanism waits for all other possible exceptions to be reported. It 
then generates this exception if no higher priority exception exists. This exception is 
enabled only if MSR[EE] is currently set. If EE is zero when the exception is detected, it is 
delayed until the bit is set. 





Nonmaskable, asynchronous exceptions (namely, system reset or machine check 
exceptions) may occur at any time. That is, these exceptions are not delayed if another 
exception is being handled (although machine check exceptions can be delayed by system 
reset exceptions). As a result, state information for the interrupted exception handler may 
be lost. 


All other exceptions have lower priority than system reset and machine check exceptions, 
and the exception may not be taken immediately when it is recognized. Only one 
synchronous, precise exception can be reported at a time. If a maskable, asynchronous or 
an imprecise exception condition occurs while instruction-caused exceptions are being 
processed, its handling is delayed until all exceptions caused by previous instructions in the 
program flow are handled and those instructions complete execution. 


6.2 Exception Processing 


When an exception is taken, the processor uses the save/restore registers, SRR1 and SRRO, 
respectively, to save the contents of the MSR for the interrupted process and to help 
determine where instruction execution should resume after the exception is handled. 


When an exception occurs, the address saved in SRRO is used to help calculate where 
instruction processing should resume when the exception handler returns control to the 
interrupted process. Depending on the exception, this may be the address in SRRO or at the 
next address in the program flow. All instructions in the program flow preceding this one 
will have completed execution and no subsequent instruction will have begun execution. 
This may be the address of the instruction that caused the exception or the next one (as in 
the case of a system call or trap exception). The SRRO register is shown in Figure 6-1. 
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[_] Reserved 





SRRO (holds EA for instruction in interrupted program flow) 00 


0 293031 


Figure 6-1. Machine Status Save/Restore Register 0 


The save/restore register 1 (SRR1) is used to save machine status (selected bits from the 
MSR and other implementation-specific status bits as well) on exceptions and to restore 
those values when rfi is executed. SRR1 is shown in Figure 6-2. 


Exception-specific information and MSR bit values 


0 31 





Figure 6-2. Machine Status Save/Restore Register 1 


When an exception occurs, SRR1 bits 1-4 and 10-15 are loaded with exception-specific 
information and MSR bits 16-23, 25—27, and 30-31 are placed into the corresponding bit 
positions of SRR1. Depending on the implementation, additional bits of the MSR may be 
copied to SRR1. 


Note that, in some implementations, every instruction fetch when MSR[IR] = 1, and every 
data access requiring address translation when MSR[DR] = 1, may modify SRRO and 
SRR1. 


The MSR is 32 bits wide as shown in Figure 6-3. Note that the 32-bit implementation of 
the MSR is comprised of the 32 least-significant bits of the 64-bit MSR. 








[_] Reserved 
0 12 13 14 15 16171819 20 2122 23 24 252627282930 31 


Figure 6-3. Machine State Register (MSR) 
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Table 6-5 shows the bit definitions for the MSR. 
Table 6-5. MSR Bit Settings 


ee [| ees SSOSCSCSCSCSC“‘“‘S*~*d 


13 POW | Power management enable 
0 Power management disabled (normal operation mode). 
1 Power management enabled (reduced power mode). 


Note: Power management functions are implementation-dependent. If the function is not 
implemented, this bit is treated as reserved. 


EC a 


15 ILE Exception little-endian mode. When an exception occurs, this bit is copied into MSR[LE] to select the 
endian mode for the context established by the exception. 
16 EE External interrupt enable 
0 While the bit is cleared the processor delays recognition of external interrupts and decrementer 
exception conditions. 
1 The processor is enabled to take an external interrupt or the decrementer exception. 
17 Privilege level 
0 The processor can execute both user- and supervisor-level instructions. 
1 The processor can only execute user-level instructions. 
18 FP Floating-point available 
0 The processor prevents dispatch of floating-point instructions, including floating-point loads, 
stores, and moves. 
1 The processor can execute floating-point instructions. 
19 ME Machine check enable 
0 Machine check exceptions are disabled. 
1 Machine check exceptions are enabled. 
Floating-point exception mode 0 (see Table 2-10). 
21 SE Single-step trace enable (Optional) 
0 The processor executes instructions normally. 
1 The processor generates a single-step trace exception upon the successful execution of the 
next instruction. 
Note: If the function is not implemented, this bit is treated as reserved. 


22 BE Branch trace enable (Optional) 
0 The processor executes branch instructions normally. 
1 The processor generates a branch trace exception after completing the execution of a branch 
instruction, regardless of whether or not the branch was taken. 


Note: If the function is not implemented, this bit is treated as reserved. 


Floating-point exception mode 1 (See Table 2-10). 
En 


25 Exception prefix. The setting of this bit specifies whether an exception vector offset is prepended 
with Fs or Os. In the following description, nnnnn is the offset of the exception vector. See Table 6-2. 
0 Exceptions are vectored to the physical address 0x000n_nnnn. 
1 Exceptions are vectored to the physical address OxFFFn_nnnn. 


In most systems, IP is set to 1 during system initialization, and then cleared to 0 when initialization is 
complete. 
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Table 6-5. MSR Bit Settings (Continued) 


26 Instruction address translation 

0 Instruction address translation is disabled. 

1 Instruction address translation is enabled. 

For more information see Chapter 7, “Memory Management.” 
27 Data address translation 

0 Data address translation is disabled. 

1 Data address translation is enabled. 

For more information see Chapter 7, “Memory Management.” 


a 


30 Recoverable exception (for system reset and machine check exceptions). 
0 Exception is not recoverable. 
1 Exception is recoverable. 
For more information see Section 6.4.1, “System Reset Exception (0x00100),”and Section 6.4.2, 
“Machine Check Exception (0x00200).” 
31 LE Little-endian mode enable 
0 The processor runs in big-endian mode. 
1 The processor runs in little-endian mode. 


Those MSR bits that are written to SRR1 are written when the first instruction of the 
exception handler is encountered. The data address register (DAR) is used by several 
exceptions (for example, DSI and alignment exceptions) to identify the address of a 
memory element. 





6.2.1 Enabling and Disabling Exceptions 


When a condition exists that may cause an exception to be generated, it must be determined 
whether the exception is enabled for that condition as follows: 


¢ IEEE floating-point enabled exceptions (a type of program exception) are ignored 
when both MSR[FEO] and MSR[FE1] are cleared. If either of these bits is set, all 
IEEE enabled floating-point exceptions are taken and cause a program exception. 


e Asynchronous, maskable exceptions (that is, the external and decrementer 
interrupts) are enabled by setting the MSR[EE] bit. When MSR[EE] = 0, recognition 
of these exception conditions is delayed. MSR[EE] is cleared automatically when an 
exception is taken to delay recognition of conditions causing those exceptions. 


e« A machine check exception can only occur if the machine check enable bit, 
MSR[ME], is set. If MSR[ME] is cleared, the processor goes directly into checkstop 
state when a machine check exception condition occurs. 
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6.2.2 Steps for Exception Processing 


After it is determined that the exception can be taken (by confirming that any instruction- 
caused exceptions occurring earlier in the instruction stream have been handled, and by 
confirming that the exception is enabled for the exception condition), the processor does 
the following: 


1. 


The machine status save/restore register 0 (SRRO) is loaded with an instruction 
address that depends on the type of exception. See the individual exception 
description for details about how this register is used for specific exceptions. 


. SRRI bits 1-4 and 10-15 are loaded with information specific to the exception type. 
. MSR bits 16-23, 25-27, and 30-31 are loaded with a copy of the corresponding bits 


of the MSR. Note that depending on the implementation, additional bits from the 
MSR may be saved in SRR1. 


. The MSR is set as described in Table 6-7. The new values take effect beginning with 


the fetching of the first instruction of the exception-handler routine located at the 
exception vector address. 


Note that MSR[IR] and MSR[DR] are cleared for all exception types; therefore, 
address translation is disabled for both instruction fetches and data accesses 
beginning with the first instruction of the exception-handler routine. 


Also, note that the MSR[ILE] bit setting at the time of the exception is copied to 
MSR[LE] when the exception is taken (as shown in Table 6-7). 


. Instruction fetch and execution resumes, using the new MSR value, at a location 


specific to the exception type. The location is determined by adding the exception's 
vector offset (see Table 6-2) to the base address determined by MSR[IP]. If IP is 
cleared, exceptions are vectored to the physical address 0x000n_nnnn. If IP is set, 
exceptions are vectored to the physical address OxFFFn_nnnn. For a machine check 
exception that occurs when MSR[ME] = 0 (machine check exceptions are disabled), 
the checkstop state is entered (the machine stops executing instructions). See 
Section 6.4.2, “Machine Check Exception (0x00200).” 


In some implementations, any instruction fetch with MSR[IR] = 1 and any load or store 
with MSR[DR] = | may cause SRRO and SRR1 to be modified. 
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6.2.3 Returning from an Exception Handler 


The Return from Interrupt (rfi) instruction performs context synchronization by allowing 
previously issued instructions to complete before returning to the interrupted process. 
Execution of the rfi instruction ensures the following: 


¢ All previous instructions have completed to a point where they can no longer cause 
an exception. 


If a previous instruction causes a direct-store interface error exception, the results 
are determined before this instruction is executed. However, note that the direct- 
store facility is being phased out of the architecture and will not likely be supported 
in future devices. 


e Previous instructions complete execution in the context (privilege, protection, and 
address translation) under which they were issued. 


¢ The rfi instruction copies SRR1 bits back into the MSR. 


¢ The instructions following this instruction execute in the context established by this 
instruction. 


For a complete description of context synchronization, refer to Section 6.1.2.1, “Context 
Synchronization.” 


6.3 Process Switching 
The operating system should execute the following when processes are switched: 


e The sync instruction, which orders the effects of instruction execution. All 
instructions previously initiated appear to have completed before the sync 
instruction completes, and no subsequent instructions appear to be initiated until the 
sync instruction completes. 

¢ The isync instruction, which waits for all previous instructions to complete and then 
discards any fetched instructions, causing subsequent instructions to be fetched (or 
refetched) from memory and to execute in the context (privilege, translation, 
protection, etc.) established by the previous instructions. 

¢ The stwex. instruction, to clear any outstanding reservations, which ensures that an 
lwarx instruction in the old process is not paired with an stwex. instruction in the 
new process. 


The operating system should handle MSR[RI] as follows: 


¢ In machine check and system reset exception handlers—If the SRR1 bit 
corresponding to MSR[RI] is cleared, the exception is not recoverable. 


¢ In each exception handler—When enough state information has been saved that a 
machine check or system reset exception can reconstruct the previous state, set 
MSR[RI]. 


e At the end of each exception handler—Clear MSR[RI], set the SRRO and SRR1 
registers appropriately, and then execute rfi. 
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Note that the RI bit being set indicates that, with respect to the processor, enough processor 
state data is valid for the processor to continue, but it does not guarantee that the interrupted 
process can resume. 


6.4 Exception Definitions 


Table 6-6 shows all the types of exceptions that can occur and certain MSR bit settings 
when the exception handler is invoked. Depending on the exception, certain of these bits 
are stored in SRR1 when an exception is taken. The following subsections describe each 
exception in detail. 


Table 6-6. MSR Setting Due to Exception 





0 Bit is cleared 
1 Bit is set 


ILE Bit is copied from the ILE bit in the MSR. 
—_ Bit is not altered 
Reading of reserved bits may return 0, even if the value last written to it was 1. 
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6.4.1 System Reset Exception (0x00100) 


The system reset exception is a nonmaskable, asynchronous exception signaled to the 
processor typically through the assertion of a system-defined signal; see Table 6-7. 


Table 6-7. System Reset Exception—Register Settings 
Setting Description 
SRRO_ | Set to the effective address of the instruction that the processor would have attempted to execute next if 
no exception conditions were present. 


Cleared 

Cleared 

Loaded with equivalent bits from the MSR 

Loaded with equivalent bits from the MSR 

Loaded from the equivalent MSR bit, MSR[RI], if the exception is recoverable; 
otherwise cleared. 

Loaded with equivalent bit from the MSR 


Note that depending on the implementation, additional bits in the MSR may be copied to SRR1. 
If the processor state is corrupted to the extent that execution cannot resume reliably, the bit 
corresponding to MSR[RI], (SRR1[30]), is cleared. 


DR 0O 
RI 0 
LE Set to value of ILE 





When a system reset exception is taken, instruction execution continues at offset 0x00100 
from the physical base address determined by MSR[IP]. 


If the exception is recoverable, the value of the MSR[RI] bit is copied to the corresponding 
SRRI1 bit. The exception functions as a context-synchronizing operation. If a reset 
exception causes the loss of: 


* an external exception (interrupt or decrementer), 


* direct-store error type DSI (the direct-store facility is being phased out of the 
architecture—not likely to be supported in future devices), or 


¢ floating-point enabled type program exception, 


then the exception is not recoverable. If the SRR1 bit corresponding to MSR[RI] is cleared, 
the exception is context-synchronizing only with respect to subsequent instructions. Note 
that each implementation provides a means for software to distinguish between power-on 
reset and other types of system resets (such as soft reset). 
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6.4.2 Machine Check Exception (0x00200) 


If no higher-priority exception is pending (namely, a system reset exception), the processor 
initiates a machine check exception when the appropriate condition is detected. Note that 
the causes of machine check exceptions are implementation- and system-dependent, and 
are typically signalled to the processor by the assertion of a specified signal on the 
processor interface. 


When a machine check condition occurs and MSR[ME] = 1, the exception is recognized 
and handled. If MSR[ME] = 0 and a machine check occurs, the processor generates an 
internal checkstop condition. When a processor is in checkstop state, instruction processing 
is suspended and generally cannot continue without resetting the processor. Some 
implementations may preserve some or all of the internal state of the processor when 
entering the checkstop state, so that the state can be analyzed as an aid in problem 
determination. 


In general, it is expected that a bus error signal would be used by a memory controller to 
indicate a memory parity error or an uncorrectable memory ECC error. Note that the 
resulting machine check exception has priority over any exceptions caused by the 
instruction that generated the bus operation. 


If a machine check exception causes an exception that is not context-synchronizing, the 
exception is not recoverable. Also, a machine check exception is not recoverable if it causes 
the loss of one of the following: 


e An external exception (interrupt or decrementer) 


¢ Direct-store error type DSI (the direct-store facility is being phased out of the 
architecture and is not likely to be supported in future devices) 


¢ Floating-point enabled type program exception 


If the SRRI bit corresponding to MSR[RI] is cleared, the exception is context- 
synchronizing only with respect to subsequent instructions. If the exception is recoverable, 
the SRR1 bit corresponding to MSR[RI] is set and the exception is context-synchronizing. 


Note that if the error is caused by the memory subsystem, incorrect data could be loaded 
into the processor and register contents could be corrupted regardless of whether the 
exception is considered recoverable by the SRR1 bit corresponding to MSR[RI]. 


On some implementations, a machine check exception may be caused by referring to a 
nonexistent physical (real) address, either because translation is disabled (MSR[IR] or 
MSR[DR] = 0) or through an invalid translation. On such a system, execution of the dcbz 
or deba instruction can cause a delayed machine check exception by introducing a block 
into the data cache that is associated with an invalid physical (real) address. A machine 
check exception could eventually occur when and if a subsequent attempt is made to store 
that block to memory (for example, as the block becomes the target for replacement, or as 
the result of executing a dcbst instruction). 
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When a machine check exception is taken, registers are updated as shown in Table 6-8. 


Table 6-8. Machine Check Exception—Register Settings 


Setting Description 
SRRO On a best-effort basis, implementations can set this to an EA of some instruction that was 
executing or about to be executing when the machine check condition occurred. 
Bit 30 is loaded from MSR[RI] if the processor is in a recoverable state. Otherwise cleared. The 


setting of all other SRR1 bits is implementation-dependent. 


DR 0 
Rl 0 
LE Set to value of ILE 


* Note that when a machine check exception is taken, the exception handler should set MSR[ME] as soon 
as it is practical to handle another machine check exception. Otherwise, subsequent machine check 
exceptions cause the processor to automatically enter the checkstop state. 





If MSR[RI] is set, the machine check exception may still be unrecoverable in the sense that 
execution cannot resume in the same context that existed before the exception. 


When a machine check exception is taken, instruction execution resumes at offset 0x00200 
from the physical base address determined by MSR[IP]. 


6.4.3 DSI Exception (0x00300) 


A DSI exception occurs when no higher priority exception exists and a data memory access 
cannot be performed. The condition that caused the DSI exception can be determined by 
reading the DSISR, a supervisor-level SPR (SPR18) that can be read by using the mfspr 
instruction. Bit settings are provided in Table 6-9. Table 6-9 also indicates which memory 
element is pointed to by the DAR. DSI exceptions can be generated by load/store 
instructions, cache-control instructions (icbi, dcbi, dcbz, dcbst, and dcbf), or the 
eciwx/ecowx instructions for any of the following reasons: 


¢ A load or a store instruction results in a direct-store error exception. Note that the 
direct-store facility is being phased out of the architecture and is not likely to be 
supported in future devices. 


¢ The effective address cannot be translated. That is, there is a page fault for this 
portion of the translation, so a DSI exception must be taken to retrieve the 
translation, for example from a storage device such as a hard disk drive. 


¢ The instruction is not supported for the type of memory addressed. 


— For lwarx/stwex. instructions that reference a memory location that is write- 
through-required. If the exception is not taken, the instructions execute correctly. 


— For lwarx/stwex. or eciwx/ecowx instructions that attempt to access direct-store 
segments (direct-store facility is being phased out of the architecture—not likely 
to be supported in future devices). If the exception does not occur, the results are 
boundedly undefined. 
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¢ The access violates memory protection. 


¢ The execution of an eciwx or ecowx instruction is disallowed because the external 
access register enable bit (EAR[E]) is cleared. 


e A data address breakpoint register (DABR) match occurs. The DABR facility is 
optional to the PowerPC architecture, but if one is implemented, it is recommended, 
but not required, that it be implemented as follows. A data address breakpoint match 
is detected for a load or store instruction if the three following conditions are met for 
any byte accessed: 


— EA[0-28] = DABR[DAB] 

— MSR[DR] = DABR[BT] 

— The instruction is a store and DABR[DW] = 1, or the instruction is a load and 
DABR[DR] = 1. 


The DABR is described in Section 2.3.14, “Data Address Breakpoint Register 
(DABR).” DAR settings are described in Table 6-9. If the above conditions are 
satisfied, it is undefined whether a match occurs in the following cases: 


— The instruction is store conditional but the store is not performed. 
— The instruction is a load/store string of zero length. 
— The instruction is dcbz, eciwx, or ecowx. 


The cache management instructions other than dcbz never cause a match. If dcbz 
causes a match, some or all of the target memory locations may have been updated. 
For the purpose of determining whether a match occurs, eciwx is treated as a load, 
and ecowx and dcbz are treated as stores. 


If an stwex. instruction has an EA for which a normal store operation would cause a DSI 
exception but the processor does not have the reservation from Iwarx, whether a DSI 
exception is taken is implementation-dependent. 


If the value in XER[25-—31] indicates that a load or store string instruction has a length of 
zero, a DSI exception does not occur, regardless of the effective address. 


The condition that caused the exception is defined in the DSISR. As shown in Table 6-9, 
this exception also sets the data address register (DAR). 


Table 6-9. DSI Exception—Register Settings 


Setting Description 
SRRO Set to the effective address of the instruction that caused the exception. 
1-4 


Cleared 
10-15 Cleared 
16-23 Loaded with equivalent bits from the MSR 
25-27 Loaded with equivalent bits from the MSR 
30-31 Loaded with equivalent bits from the MSR 





Note that depending on the implementation, additional bits in the MSR may be copied to SRR1. 
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Table 6-9. DSI Exception—Register Settings (Continued) 


Setting Description 


MSR DR 0O 
RI 0 
LE Set to value of ILE 


DSISR Set if a load or store instruction results in a direct-store error exception; otherwise cleared. Note 
that the direct-store facility is being phased out of the architecture and is not likely to be 
supported in future devices. 

Set if the translation of an attempted access is not found in the primary hash table entry group 
(HTEG), or in the rehashed secondary HTEG, or in the range of a DBAT register (page fault 
condition); otherwise cleared. 
Cleared 
Set if a memory access is not permitted by the page or DBAT protection mechanism; otherwise 
cleared. 
Set if the eciwx, ecowx, Iwarx, or stwex. instruction is attempted to direct-store interface space, 
or if the lwarx or stwex instruction is used with addresses that are marked as write-through. 
Otherwise cleared to 0. Note that the direct-store facility is being phased out of the architecture 
and is not likely to be supported in future devices. 
Set for a store operation and cleared for a load operation. 
Cleared 
Set if a DABR match occurs. Otherwise cleared. 
Cleared 
Set if the instruction is an eciwx or ecowx and EAR[E] = 0; otherwise cleared. 
12-31 Cleared 
Due to the multiple exception conditions possible from the execution of a single instruction, the 
following combinations of bits of DSISR may be set concurrently: 
+ Bits 1 and 11 
* Bits 4 and5 
* Bits 4 and 11 
+ Bits 5 and 11 
Additonally, bit 6 is set if the instruction that caused the exception is a store, ecowx, dcbz, dcba, or 
debi and bit 6 would otherwise be cleared. Also, bit 9 (DABR match) may be set alone, or in 
combination with any other bit, or with any of the other combinations shown above. 


Set to the effective address of a memory element as described in the following list: 
A byte in the first word accessed in the segment or BAT area that caused the DSI exception, for a 
byte, half word, or word memory access (to a segment or BAT area). 
A byte in the first double word accessed in the segment or BAT area that caused the DSI exception, 
for a double-word memory access (to a segment or BAT area). 
A byte in the block that caused the exception for a cache management instruction. 
Any EA in the memory range addressed (for direct-store error exceptions). Note that the direct-store 
facility is being phased out of the architecture and is not likely to be supported in future devices. 
The EA computed by the instruction for the attempted execution of an eciwx or ecowx instruction 
when EAR[E] is cleared. 
*If the exception is caused by a DABR match, the DAR is set to the effective address of any byte in the 
range from A to B inclusive, where A is the effective address of the word (for a byte, half word,or word 
access) or double word (for a double word access) specified by the EA computed by the instruction, 
and B is the EA of the last byte in the word or double word in which the match occurred. 





When a DSI exception is taken, instruction execution resumes at offset 0x00300 from the 
physical base address determined by MSR[IP]. 
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6.4.4 ISI Exception (0x00400) 


An ISI exception occurs when no higher priority exception exists and an attempt to fetch 
the next instruction to be executed fails for any of the following reasons: 


¢ The effective address cannot be translated. For example, when there is a page fault 
for this portion of the translation, an ISI exception must be taken to retrieve the page 
(and possibly the translation), typically from a storage device. 


e An attempt is made to fetch an instruction from a no-execute segment. 
¢ An attempt is made to fetch an instruction from guarded memory and MSR[IR] = 1. 
¢ The fetch access violates memory protection. 


e An attempt is made to fetch an instruction from a direct-store segment. Note that the 
direct-store facility is being phased out of the architecture and is not likely to be 
supported in future devices. 


Register settings for ISI exceptions are shown in Table 6-10. 


Table 6-10. ISI Exception—Register Settings 


Setting Description 


SRRO Set to the effective address of the instruction that the processor would have attempted to execute next 
if no exception conditions were present (if the exception occurs on attempting to fetch a branch target, 
SRRO is set to the branch target address). 


SRR1 Set if the translation of an attempted access is not found in the primary hash 
table entry group (HTEG), or in the rehashed secondary HTEG, or in the 
range of an IBAT register (page fault condition); otherwise cleared. 

Cleared 

Set if the fetch access occurs to a direct-store segment (SR[T] = 1), to a no- 
execute segment (N bit set in segment descriptor), or to guarded memory 
when MSR[IR] = 1. Otherwise, cleared. Note that the direct-store facility is 
being phased out of the architecture and is not likely to be supported in future 
devices. 


Set if a memory access is not permitted by the page or IBAT protection 
mechanism, described in Chapter 7, “Memory Management’; otherwise 
cleared. 

Cleared 

Loaded with equivalent bits from the MSR 

Loaded with equivalent bits from the MSR 

Loaded with equivalent bits from the MSR 


Note that only one of bits 1, 3, and 4 can be set. 
Also, note that depending on the implementation, additional bits in the MSR may be copied to SRR1. 


DR 0O 
RI 0 
LE Set to value of ILE 





When an ISI exception is taken, instruction execution resumes at offset 0x00400 from the 
physical base address determined by MSR[IP]. 
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6.4.5 External Interrupt (0x00500) 


An external interrupt exception is signaled to the processor by the assertion of the external 
interrupt signal. The exception may be delayed by other higher priority exceptions or if the 
MSR[EE] bit is zero when the exception is detected. Note that the occurrance of this 
exception does not cancel the external request. 


The register settings for the external interrupt exception are shown in Table 6-11. 


Table 6-11. External Interrupt—Register Settings 
Setting Description 
SRRO Set to the effective address of the instruction that the processor would have attempted to execute next 
if no interrupt conditions were present. 


SRR1 Cleared 
ie 15 Cleared 
16-23 Loaded with equivalent bits from the MSR 
25-27 Loaded with equivalent bits from the MSR 
30-31 Loaded with equivalent bits from the MSR 
Note that depending on the implementation, additional bits in the MSR may be copied to SRR1. 


DR O 
RI 0 
LE — Set to value of ILE 


When an external interrupt exception is taken, instruction execution resumes at offset 
0x00500 from the physical base address determined by MSR[IP]. 





6.4.6 Alignment Exception (0x00600) 


This section describes conditions that can cause alignment exceptions in the processor. 
Similar to DSI exceptions, alignment exceptions use the SRRO and SRRI to save the 
machine state and the DSISR to determine the source of the exception. An alignment 
exception occurs when no higher priority exception exists and the implementation cannot 
perform a memory access for one of the following reasons: 


¢ The operand of a floating-point load or store instruction is not word-aligned. 
¢ The operand of Imw, stmw, Iwarx, stwex., eciwx, or ecowx is not aligned. 


¢ The instruction is lmw, stmw, Iswi, Iswx, stswi, or stswx and the processor is in 
little-endian mode. 


¢ The operand of an elementary or string load or store crosses a protection boundary. 


¢ The operand of Imw or stmw crosses a segment or BAT boundary. 
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¢ The operand of dcbz is in memory that is write-through-required or caching 
inhibited, or dcbz is executed in an implementation that has either no data cache or 
a write-through data cache. 


¢ The operand of a floating-point load or store instruction is in a direct-store segment 
(T = 1). Note that the direct-store facility is being phased out of the architecture and 
is not likely to be supported in future devices. 


For Imw, stmw, Iswi, Iswx, stswi, and stswx instructions in little-endian mode, an 
alignment exception always occurs. For lmw and stmw instructions with an operand that is 
not aligned in big-endian mode, and for lwarx, stwex., eciwx, and ecowx with an operand 
that is not aligned in either endian mode, an implementation may yield boundedly- 
undefined results instead of causing an alignment exception (for eciwx and ecowx when 
EAR[E] = 0, a third alternative is to cause a DSI exception). For all other cases listed above, 
an implementation may execute the instruction correctly instead of causing an alignment 
exception. For the debz instruction, correct execution means clearing each byte of the block 
in main memory. See Section 3.1, “Data Organization in Memory and Data Transfers,” for 
a complete definition of alignment in the PowerPC architecture. 


The term, ‘protection boundary’, refers to the boundary between protection domains. A 
protection domain is a segment, a block of memory defined by a BAT entry, a virtual 4- 
Kbyte page, or a range of unmapped effective addresses. Protection domains are defined 
only when the corresponding address translation (instruction or data) is enabled (MSR[IR] 
or MSR[DR] = 1). 


The register settings for alignment exceptions are shown in Table 6-12. 


Table 6-12. Alignment Exception—Register Settings 


Setting Description 
SRRO Set to the effective address of the instruction that caused the exception. 


SRR1 Cleared 
i 15 Cleared 
16-23 Loaded with equivalent bits from the MSR 
25-27 Loaded with equivalent bits from the MSR 
30-31 Loaded with equivalent bits from the MSR 
Note that depending on the implementation, additional bits in the MSR may be copied to SRR1. 





DR 0O 
RI 0 
LE Set to value of ILE 
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Table 6-12. Alignment Exception—Register Settings (Continued) 


Setting Description 


DSISR 0-14 Cleared 
15-16 For instructions that use register indirect with index addressing—set to bits 29-30 of the 
instruction encoding. 
For instructions that use register indirect with immediate index addressing—cleared 
17 For instructions that use register indirect with index addressing—set to bit 25 of the instruction 
encoding. 
For instructions that use register indirect with immediate index addressing— set to bit 5 of the 
instruction encoding. 
For instructions that use register indirect with index addressing—set to bits 21-24 of the 
instruction encoding. 
For instructions that use register indirect with immediate index addressing—set to bits 1—4 of the 
instruction encoding. 
Set to bits 6-10 (identifying either the source or destination) of the instruction encoding. 
Undefined for debz. 
Set to bits 11-15 of the instruction encoding (rA) for update-form instructions 
Set to either bits 11-15 of the instruction encoding or to any register number not in the range of 
registers loaded by a valid form instruction for Imw, Iswi, and Iswx instructions. Otherwise 
undefined. 
Note that for load or store instructions that use register indirect with index addressing, the DSISR can 
be set to the same value that would have resulted if the corresponding instruction uses register indirect 
with immediate index addressing had caused the exception. Similarly, for load or store instructions that 
use register indirect with immediate index addressing, DSISR can hold a value that would have resulted 
from an instruction that uses register indirect with index addressing. For example, a misaligned lwarx 
instruction that crosses a protection boundary would normally cause the DSISR to be set to the 
following binary value: 


If there is no corresponding instruction, no alternative value can be specified. 


The instruction pairs that can use the same DSISR values are as follows: 

Ibz/Ibzx Ibzu/Ibzux Ihz/Ihzx Ihzu/Ihzux Iha/Ihax Ihau/Ihaux 
lwz/lwzx lwzu/lwzux lwa/Ilwax stb/stbx stbu/stbux sth/sthx 
sthu/sthux — stw/stwx stwu/stwux lfs/lfsx lfsu/lfsux stfs/stfsx 
stfsu/stfsux 


DAR Set to the EA of the data access as computed by the instruction causing the alignment exception. 





The architecture does not support the use of a misaligned EA by load/store with reservation 
instructions or by the eciwx and ecowx instructions. If one of these instructions specifies a 
misaligned EA, the exception handler should not emulate the instruction but should treat 
the occurrence as a programming error. 


Chapter 6. Exceptions 6-29 


6.4.6.1 Integer Alignment Exceptions 

Operations that are not naturally aligned may suffer performance degradation, depending 
on the processor design, the type of operation, the boundaries crossed, and the mode that 
the processor is in during execution. More specifically, these operations may either cause 
an alignment exception or they may cause the processor to break the memory access into 
multiple, smaller accesses with respect to the cache and the memory subsystem. 


6.4.6.1.1 Page Address Translation Access Considerations 

A page address translation access occurs when MSR[DR] is set, SR[T] is cleared, and there 
is no BAT match. Note that a dcbz instruction causes an alignment exception if the access 
is to a page or block with the W (write-through) or I (cache-inhibit) bit set. 


Misaligned memory accesses that do not cause an alignment exception may not perform as 
well as an aligned access of the same type. The resulting performance degradation due to 
misaligned accesses depends on how well each individual access behaves with respect to 
the memory hierarchy. 


Particular details regarding page address translation is implementation-dependent; the 
reader should consult the user’s manual for the appropriate processor for more information. 


6.4.6.1.2 Direct-Store Interface Access Considerations 
The following apply for direct-store interface accesses: 


¢ Ifa256-Mbyte boundary will be crossed by any portion of the direct-store interface 
space accessed by an instruction (the entire string for strings/multiples), an 
alignment exception is taken. 


¢ Floating-point loads and stores to direct-store segments may cause an alignment 
exception, regardless of operand alignment. 
¢ The load/store word with reservation instructions that map into a direct-store 


segment always cause a DSI exception. However, if the instruction crosses a 
segment boundary an alignment exception is taken instead. 


Note that the direct-store facility is being phased out of the architecture and is not likely to 
be supported in future devices. 


6.4.6.2 Little-Endian Mode Alignment Exceptions 

The OEA allows implementations to take alignment exceptions on misaligned accesses (as 
described in Section 3.1.4, “PowerPC Byte Ordering”’) in little-endian mode but does not 
require them to do so. Some implementations may perform some misaligned accesses 
without taking an alignment exception. 
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6.4.6.3 Interpretation of the DSISR as Set by an Alignment Exception 


For most alignment exceptions, an exception handler may be designed to emulate the 
instruction that causes the exception. To do this, the handler requires the following 
characteristics of the instruction: 


¢ Load or store 

¢ Length (half word or word) 

¢ String, multiple, or normal load/store 

¢ Integer or floating-point 

¢ Whether the instruction performs update 

¢ Whether the instruction performs byte reversal 

¢ Whether it is a debz instruction 
The PowerPC architecture provides this information implicitly, by setting opcode bits in the 
DSISR that identify the excepting instruction type. The exception handler does not need to 


load the excepting instruction from memory. The mapping for all exception possibilities is 
unique except for the few exceptions discussed below. 


Table 6-13 shows the inverse mapping—how the DSISR bits identify the instruction that 
caused the exception. 


The alignment exception handler cannot distinguish a floating-point load or store that 
causes an exception because it is misaligned, or because it addresses the direct-store 
interface space. However, this does not matter; in either case it is emulated with integer 
instructions. Note that the direct-store facility is being phased out of the architecture and is 
not likely to be supported in future devices. 


Table 6-13. DSISR(15—21) Settings to Determine Misaligned Instruction 
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Table 6-13. DSISR(15-—21) Settings to Determine Misaligned Instruction (Continued) 
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The instructions Ilwz and lwarx give the same DSISR bits (all zero). But if lwarx causes an 
alignment exception, it is an invalid form, so it need not be emulated in any precise way. It is 
adequate for the alignment exception handler to simply emulate the instruction as if it were an 
lwz. It is important that the emulator use the address in the DAR, rather than computing it 
from rA/rB/D, because Iwz and Iwarx use different addressing modes. 


If opcode 0 (“illegal or reserved”) can cause an alignment exception, it will be indistiguishable 
to the exception handler from Iwarx and Iwz. 











These instructions are distinguished by DSISR[12—13], which are not shown in this table. 
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6.4.7 Program Exception (0x00700) 


A program exception occurs when no higher priority exception exists and one or more of 
the following exception conditions, which correspond to bit settings in SRR1, occur during 
execution of an instruction: 


System IEEE floating-point enabled exception—A system IEEE floating-point 
enabled exception can be generated when FPSCR[FEX] is set and either (or both) 
of the MSR[FEO] or MSR[FE1] bits is set. 


FPSCR[FEX] is set by the execution of a floating-point instruction that causes an 
enabled exception or by the execution of a “move to FPSCR” type instruction that 
sets an exception bit when its corresponding enable bit is set. Floating-point 
exceptions are described in Section 3.3.6, “Floating-Point Program Exceptions.” 


Illegal instruction—An illegal instruction program exception is generated when 
execution of an instruction is attempted with an illegal opcode or illegal combination 
of opcode and extended opcode fields (these include PowerPC instructions not 
implemented in the processor), or when execution of an optional or a reserved 
instruction not provided in the processor is attempted. 


Note that implementations are permitted to generate an illegal instruction program 
exception when encountering the following instructions. If an illegal instruction 
exception is not generated, then the alternative is shown in parenthesis. 


— An instruction corresponds to an invalid class (the results may be boundedly 
undefined) 


— An Iswx instruction for which rA or rB is in the range of registers to be loaded 
(may cause results that are boundedly undefined) 


— A move to/from SPR instruction with an SPR field that does not contain one of 
the defined values 


— MSR[PR] = 1 and spr[0] = 1 (this can cause a privileged instruction program 
exception) 


— MSR[PR] = 0 or spr[0] = 0 (may cause boundedly-undefined results.) 


— An unimplemented floating-point instruction that is not optional (may cause a 
floating-point assist exception) 


Privileged instruction—A privileged instruction type program exception is 
generated when the execution of a privileged instruction is attempted and the 
processor is operating in user mode (MSR[PR] is set). It is also generated for mtspr 
or mfspr instructions that have an invalid SPR field that contain one of the defined 
values having spr[0] = 1 and if MSR[PR] = 1. Some implementations may also 
generate a privileged instruction program exception if a specified SPR field (for a 
move to/from SPR instruction) is not defined for a particular implementation, but 
spr[0] = 1; in this case, the implementation may cause either a privileged instruction 
program exception, or an illegal instruction program exception may occur instead. 
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¢ Trap—A trap program exception is generated when any of the conditions specified 
in a trap instruction is met. Trap instructions are described in Section 4.2.4.6, “Trap 
Instructions.” 


The register settings when a program exception is taken are shown in Table 6-14. 


Table 6-14. Program Exception—Register Settings 


Setting Description 


SRRO The contents of SRRO differ according to the following situations: 
¢ For all program exceptions except floating-point enabled exceptions when operating in imprecise 
mode (MSR[FEO]  MSR[FE1]), SRRO contains the EA of the excepting instruction. 
When the processor is in floating-point imprecise mode, SRRO may contain the EA of the excepting 
instruction or that of a subsequent unexecuted instruction. If the subsequent instruction is syne or 


isync, SRRO points no more than four bytes beyond the sync or isync instruction. 

If FPSCR[FEX] = 1, but IEEE floating-point enabled exceptions are disabled (MSR[FEO] = 
MSR[FE1] = 0), the program exception occurs before the next synchronizing event if an instruction 
alters those bits (thus enabling the program exception). When this occurs, SRRO points to the 
instruction that would have executed next and not to the instruction that modified MSR. 


SRR1 Cleared 
Cleared 
Set for an IEEE floating-point enabled program exception; otherwise cleared. 
Set for an illegal instruction program exception; otherwise cleared. 
Set for a privileged instruction program exception; otherwise cleared. 
Set for a trap program exception; otherwise cleared. 
Cleared if SRRO contains the address of the instruction causing the 
exception, and set if SRRO contains the address of a subsequent instruction. 
16-23 Loaded with equivalent bits from the MSR 
25-27 Loaded with equivalent bits from the MSR 
30-31 Loaded with equivalent bits from the MSR 


Note that depending on the implementation, additional bits in the MSR may be copied to SRR1. 


MSR DR 0O 
RI 0 
LE Set to value of ILE 


When a program exception is taken, instruction execution resumes at offset 0x00700 from 
the physical base address determined by MSR[IP]. 





6.4.8 Floating-Point Unavailable Exception (0x00800) 


A floating-point unavailable exception occurs when no higher priority exception exists, an 
attempt is made to execute a floating-point instruction (including floating-point load, store, 
or move instructions), and the floating-point available bit in the MSR is cleared, 
(MSR[FP] = 0). 
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The register settings for floating-point unavailable exceptions are shown in Table 6-15. 


Table 6-15. Floating-Point Unavailable Exception—Register Settings 


Setting Description 
SRRO Set to the effective address of the instruction that caused the exception. 
1-4 


Cleared 
10-15 Cleared 
16-23 Loaded with equivalent bits from the MSR 
25-27 Loaded with equivalent bits from the MSR 


30-31 Loaded with equivalent bits from the MSR 


Note that depending on the implementation, additional bits in the MSR may be copied to SRR1. 


DR 0O 
RI 0 
LE Set to value of ILE 





When a floating-point unavailable exception is taken, instruction execution resumes at 
offset 0x00800 from the physical base address determined by MSR[IP]. 


6.4.9 Decrementer Exception (0x00900) 


A decrementer exception occurs when no higher priority exception exists, a decrementer 
exception condition occurs (for example, the decrementer register has completed 
decrementing), and MSR[EE] = 1. The decrementer register counts down, causing an 
exception request when it passes through zero. A decrementer exception request remains 
pending until the decrementer exception is taken and then it is cancelled. The decrementer 
implementation meets the following requirements: 


¢ The counters for the decrementer and the time-base counter are driven by the same 
fundamental time base. 


¢ Loading a GPR from the decrementer does not affect the decrementer. 


e Storing a GPR value to the decrementer replaces the value in the decrementer with 
the value in the GPR. 


¢ Whenever bit 0 of the decrementer changes from 0 to 1, a decrementer exception 
request is signaled. If multiple decrementer exception requests are received before 
the first can be reported, only one exception is reported. The occurrence of a 
decrementer exception cancels the request. 


¢ Ifthe decrementer is altered by software and if bit 0 is changed from 0 to 1, an 
exception request is signaled. 
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The register settings for the decrementer exception are shown in Table 6-16. 


Table 6-16. Decrementer Exception—Register Settings 
Setting Description 
SRRO Set to the effective address of the instruction that the processor would have attempted to execute next 
if no exception conditions were present. 


Cleared 
nee Cleared 
16-23 Loaded with equivalent bits from the MSR 


25-27 Loaded with equivalent bits from the MSR 
30-31 Loaded with equivalent bits from the MSR 


Note that depending on the implementation, additional bits in the MSR may be copied to SRR1. 


DR 0O 
Rl 0 
LE — Set to value of ILE 





When a decrementer exception is taken, instruction execution resumes at offset 0x00900 
from the physical base address determined by MSR[IP]. 


6.4.10 System Call Exception (0x00C00) 


A system call exception occurs when a System Call (sc) instruction is executed. The 
effective address of the instruction following the sc instruction is placed into SRRO. MSR 
bits are saved in SRR1, as shown in Table 6-17. Then a system call exception is generated. 


The system call exception causes the next instruction to be fetched from offset OxOOCOO 
from the physical base address determined by the new setting of MSR[IP]. As with most 
other exceptions, this exception is context-synchronizing. Refer to Section 6.1.2.1, 
“Context Synchronization,” for more information on the actions performed by a context- 
synchronizing operation. Register settings are shown in Table 6-17. 


Table 6-17. System Call Exception—Register Settings 


Setting Description 
SRRO Set to the effective address of the instruction following the System Call instruction 


SRR1 Cleared 
iy 15 Cleared 
16-23 Loaded with equivalent bits from the MSR 
25-27 Loaded with equivalent bits from the MSR 
30-31 Loaded with equivalent bits from the MSR 
Note that depending on the implementation, additional bits in the MSR may be copied to SRR1. 





DR 0O 
RI 0 
LE Set to value of ILE 
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When a system call exception is taken, instruction execution resumes at offset 0Ox00C00 
from the physical base address determined by MSR[IP]. 


6.4.11 Trace Exception (0x00D00) 


The trace exception is optional to the PowerPC architecture, and specific information about 
how it is implemented can be found in user’s manuals for individual processors. 


The trace exception provides a means of tracing the flow of control of a program for 
debugging and performance analysis purposes. It is controlled by MSR bits SE and BE as 
follows: 


¢ MSR[SE] = 1: the processor generates a single-step type trace exception after each 
instruction that completes without causing an exception or context change (such as 
occurs when an sc, rfi, or a load instruction that causes an exception, for example, 
is executed). 


¢ MSR[BE] = 1: the processor generates a branch-type trace exception after 
completing the execution of a branch instruction, whether or not the branch is taken. 


If this facility is implemented, a trace exception occurs when no higher priority exception 
exists and either of the conditions described above exist. The following are not traced: 

¢ rfi instruction 

* sc, and trap instructions that trap 

¢ Other instructions that cause exceptions (other than trace exceptions) 

¢ The first instruction of any exception handler 

¢ Instructions that are emulated by software 


MSR[SE, BE] are both cleared when the trace exception is taken. In the normal use of this 
function, MSR[SE, BE] are restored when the exception handler returns to the interrupted 
program using an rfi instruction. 
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Register settings for the trace mode are described in Table 6-18. 


Table 6-18. Trace Exception—Register Settings 
Setting Description 
SRRO Set to the effective address of the next instruction to be executed in the program for which the trace 
exception was generated. 


SRR1 1-4 Cleared 
10-15 Cleared 
16-23 Loaded with equivalent bits from the MSR 
25-27 Loaded with equivalent bits from the MSR 
30-31 Loaded with equivalent bits from the MSR 
Note that depending on the implementation, additional bits in the MSR may be copied to SRR1. 


MSR DR 0O 
RI 0 
LE Set to value of ILE 


When a trace exception is taken, instruction execution resumes at offset OxOODO00 from the 
base address determined by MSR[IP]. 
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6.4.12 Floating-Point Assist Exception (Ox00E00) 


The floating-point assist exception is optional to the PowerPC architecture. It can be used 
to allow software to assist in the following situations: 


¢ Execution of floating-point instructions for which an implementation uses software 
routines to perform certain operations, such as those involving denormalization. 


¢ Execution of floating-point instructions that are not optional and are not 
implemented in hardware. In this case, the processor may generate an illegal 
instruction type program exception instead. 


Register settings for the floating-point assist exceptions are described in Table 6-19. 


Table 6-19. Floating-Point Assist Exception—Register Settings 
Setting Description 
SRRO Set to the address of the next instruction to be executed in the program for which the floating-point 
assist exception was generated. 


SRR1 1-4 Implementation-specific information 
10-15 Implementation-specific information 
16-23 Loaded with equivalent bits from the MSR 
25-27 Loaded with equivalent bits from the MSR 
30-31 Loaded with equivalent bits from the MSR 
Note that depending on the implementation, additional bits in the MSR may be copied to SRR1. 


MSR DR 0O 
RI 0 
LE Set to value of ILE 


When a floating-point assist exception is taken, instruction execution resumes as offset 
0Ox00E00 from the base address determined by MSR[IP]. 
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Chapter 7 
Memory Management 


This chapter describes the memory management unit (MMU) specifications provided by 6 
the PowerPC operating environment architecture (OEA) for PowerPC processors. The 
primary function of the MMU in a PowerPC processor is to translate logical (effective) 
addresses to physical addresses (referred to as real addresses in the architecture 
specification) for memory accesses and I/O accesses (most I/O accesses are assumed to be 
memory-mapped). In addition, the MMU provides various levels of access protection on a 
segment, block, or page basis. Note that there are many aspects of memory management 
that are implementation-dependent. This chapter describes the conceptual model of a 
PowerPC MMU; however, PowerPC processors may differ in the specific hardware used to 
implement the MMU model of the OFA, depending on the many design trade-offs inherent 
in each implementation. 


Two general types of memory accesses generated by PowerPC processors require address 
translation—instruction accesses and data accesses generated by load and _ store 
instructions. In addition, the addresses specified by cache instructions and the optional 
external control instructions also require translation. Generally, the address translation 
mechanism is defined in terms of the segment descriptors and page tables PowerPC 
processors use to locate the effective to physical address mapping for memory accesses. 
The segment information translates the effective address to an interim virtual address, and 
the page table information translates the virtual address to a physical address. 


The definition of the segment and page table data structures provides significant flexibility 
for the implementation of performance enhancement features in a wide range of processors. 
Therefore, the performance enhancements used to store the segment or page table 
information on-chip vary from implementation to implementation. 


Translation lookaside buffers (TLBs) are commonly implemented in PowerPC processors 
to keep recently-used page address translations on-chip. Although their exact 
characteristics are not specified in the OEA, the general concepts that are pertinent to the 
system software are described. 
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The segment information, used to generate the interim virtual addresses, is stored as 
segment descriptors. These descriptors may reside in on-chip segment registers (32-bit 
implementations) or as segment table entries (STEs) in memory (64-bit implementations). 
In much the same way that TLBs cache recently-used page address translations, 64-bit 
processors may contain segment lookaside buffers (SLBs) on-chip that cache recently-used 
segment table entries. Although the exact characteristics of SLBs are not specified, there is 
general information pertinent to those implementations that provide SLBs. 


The block address translation (BAT) mechanism is a software-controlled array that stores 
the available block address translations on-chip. BAT array entries are implemented as pairs 
of BAT registers that are accessible as supervisor special-purpose registers (SPRs). 


The MMU, together with the exception processing mechanism, provides the necessary 
support for the operating system to implement a paged virtual memory environment and for 
enforcing protection of designated memory areas. Exception processing is described in 
Chapter 6, “Exceptions.” Section 2.3.1, “Machine State Register (MSR),” describes the 
MSR, which controls some of the critical functionality of the MMU. (Note that the 
architecture specification refers to exceptions as interrupts.) 


Information about 64-bit-only features can be found in PowerPC Microprocessor Family: 
The Programming Environments, which describes both the 32- and 64-bit memory models 
defined by the PowerPC architecture. 


7.1 MMU Features 


The MMU of a 32-bit PowerPC processor provides 4 Gbytes of effective address space, a 
52-bit interim virtual address, and physical addresses that are < 32 bits in length. Note that 
this chapter describes address translation mechanisms from the perspective of the 
programming model. As such, it describes the structure of the page and segment tables, the 
MMU conditions that cause exceptions, the instructions provided for programming the 
MMU, and the MMU registers. The hardware implementation details of a particular MMU 
(including whether the hardware automatically performs a page table search in memory) 
are not contained in the architectural definition of PowerPC processors and are invisible to 
the PowerPC programming model; therefore, they are not described in this document. In 
the case that some of the OEA model is implemented with some software assist mechanism, 
this software should be contained in the area of memory reserved for implementation- 
specific use and should not be visible to the operating system. 
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7.2 MMU Overview 


The PowerPC MMU and exception models support demand-paged virtual memory. Virtual 
memory management permits execution of programs larger than the size of physical 
memory; the term demand paged implies that individual pages are loaded into physical 
memory from backing storage only as they are accessed by an executing program. 


The memory management model includes the concept of a virtual address that is not only 
larger than that of the maximum physical memory allowed but a virtual address space that 
is also larger than the effective address space. Effective addresses are 32 bits wide. In the 
address translation process, the processor converts an effective address to a 52-bit virtual 
address, as per the information in the selected descriptor. Then the address is translated 
back to a physical address the size (or less) of the effective address. 


Note that in the cases that implementations support a physical address range that is smaller 
than 32 bits, the high-order bits of the effective address may be ignored in the address 
translation process. The remainder of this chapter assumes that implementations support 
the maximum physical address range. 


The operating system manages the system’s physical memory resources. Consequently, the 
operating system initializes the MMU registers (segment registers, BAT registers, and 
SDRI1 register) and sets up page tables in memory appropriately. The MMU then assists the 
operating system by managing page status and optionally caching the recently-used address 
translation information on-chip for quick access. 


Effective address spaces are divided into 256-Mbyte regions called segments or into other 
large regions called blocks (128 Kbyte—256 Mbyte). Segments that correspond to memory- 
mapped areas can be further subdivided into 4-Kbyte pages. For each block or page, the 
operating system creates an address descriptor (page table entry (PTE) or BAT array entry); 
the MMU then uses these descriptors to generate the physical address, the protection 
information, and other access control information each time an address within the block or 
page is accessed. Address descriptors for pages reside in tables (as PTEs) in physical 
memory; for faster accesses, the MMU often caches on-chip copies of recently-used PTEs 
in an on-chip TLB. The MMU keeps the block information on-chip in the BAT array 
(comprised of the BAT registers). 


This section provides an overview of the high-level organization and operational concepts 
of the MMU in PowerPC processors, and a summary of all MMU control registers. For 
more information about the MSR, see Section 2.3.1, “Machine State Register (MSR).” 
Section 7.4.3, “BAT Register Implementation of BAT Array,” describes the BAT registers, 
Section 7.5.2.1, “Segment Descriptor Definitions,” describes the segment registers, and 
Section 7.6.1.1, “SDR1 Register Definitions,” describes the SDR1. 
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7.2.1 Memory Addressing 


A program references memory using the effective (logical) address computed by the 
processor when it executes a load, store, branch, or cache instruction, and when it fetches 
the next instruction. The effective address is translated to a physical address according to 
the procedures described throughout this chapter. The memory subsystem uses the physical 
address for the access. 


7.2.1.1 Effective Addresses in 32-Bit Mode 


In addition to the 64-and 32-bit memory management models defined by the OFA, the 
PowerPC architecture also defines a 32-bit mode of operation for 64-bit implementations. 
In this 32-bit mode (MSR[SF] = 0), the 64-bit effective address is first calculated as usual, 
and then the high-order 32 bits of the EA are treated as zero for the purposes of addressing 
memory. This occurs for both instruction and data accesses, and occurs independently from 
the setting of the MSR[IR] and MSR[DR] bits that enable instruction and data address 
translation, respectively. The truncation of the EA is the only way in which memory 
accesses are affected by the 32-bit mode of operation. 


For a complete discussion of effective address calculation, see Section 4.1.4.2, “Effective 
Address Calculation.” 


7.2.1.2 Predefined Physical Memory Locations 

There are four areas of the physical memory map that have predefined uses. The first 256 
bytes of physical memory (or if MSR[IP] = 1, the first 256 bytes of memory located at 
physical address OXFFFO_0000) are assigned for arbitrary use by the operating system. The 
rest of that first page of physical memory defined by the vector base address (determined 
by MSR[IP]) is either used for exception vectors, or reserved for future exception vectors. 
The third predefined area of memory consists of the second and third physical pages of the 
memory map, which are used for implementation-specific purposes. In some 
implementations, the second and third pages located at physical address 
OxFFFO_1000when MSR[IP] = 1 are also used for implementation-specific purposes. 
Fourthly, the system software defines the locations in physical memory that contain the 
page address translation tables. These predefined memory areas are summarized in 
Table 7-1 in terms of the variable ‘Base’. 


Table 7-1. Predefined Physical Memory Locations 


Physical Address Range Predefined Use 
Base || 0x0_0000-Base || 0x0_OOFF Operating system 


Base || 0x0_0100-Base || 0x0_OFFF Exception vectors 
Base || 0x0_1000-Base || 0x0_2FFF Implementation-specific! 
Software-specified—contiguous sequence of physical pages | Page table 


1Only valid for MSR[IP] = 1 on some implementations 
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Table 7-2 decodes the actual value of ‘Base’. Refer to Chapter 6, “Exceptions,” for more 
detailed information on the assignment of the exception vector offsets. 


Table 7-2. Value of Base for Predefined Memory Use 


7.2.2 MMU Organization 


Figure 7-1 shows a conceptual block diagram of the MMU in a 32-bit implementation. The 
32-bit MMU implementation differs from the 64-bit implementation in that after an address 
is generated, the high-order bits of the effective address, EAO-EA19 (or a smaller set of 
address bits, EAO—EAn, in the cases of blocks), are translated into physical address bits 
PAO—PA 19. The low-order address bits, A20—A31 are untranslated and therefore identical 
for both effective and physical addresses. After translating the address, the MMU passes the 
resulting 32-bit physical address to the memory subsystem. 
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7.2.3 Address Translation Mechanisms 
PowerPC processors support the following three types of address translation: 
¢ Page address translation—translates the page frame address for a 4-Kbyte page size 


¢ Block address translation—translates the block number for blocks that range in size 
from 128 Kbyte to 256 Mbyte 


¢ Real addressing mode address translation—when address translation is disabled, the 
physical address is identical to the effective address. 


In addition, earlier processors implement a direct-store facility that is used to generate 
direct-store interface accesses on the external bus. Note that this facility is not optimized 
for performance and was present for compatibility with POWER devices. Future devices 
are not likely to support it; software should not depend on its effects and new software 
should not use it. 


Figure 7-2 shows the address translation mechanisms provided by the MMU. The segment 
descriptors shown in the figure control both the page and direct-store segment address 
translation mechanisms. When an access uses the page or direct-store segment address 
translation, the appropriate segment descriptor is required. One of the 16 on-chip segment 
registers (which contain the segment descriptors) is selected by the highest-order effective 
address bits. 


A control bit in the corresponding segment descriptor then determines if the access is to 
memory (memory-mapped) or to a direct-store segment. Note that the direct-store interface 
is present to allow certain older I/O devices to use this interface. When an access is 
determined to be to the direct-store interface space, the implementation invokes an 
elaborate hardware protocol for communication with these devices. The direct-store 
interface protocol is not optimized for performance, and therefore, its use is discouraged. 
The most efficient method for accessing I/O is by memory-mapping the I/O areas. 


For memory accesses translated by a segment descriptor, the interim virtual address is 
generated using the information in the segment descriptor. Page address translation 
corresponds to the conversion of this virtual address into the 32-bit physical address used 
by the memory subsystem. In some cases, the physical address for the page resides in an 
on-chip TLB and is available for quick access. However, if the page address translation 
misses in a TLB, the MMU searches the page table in memory (using the virtual address 
information and a hashing function) to locate the required physical address. Some 
implementations may have dedicated hardware to perform the page table search 
automatically, while others may define an exception handler routine that searches the page 
table with software. 


Because blocks are larger than pages, there are fewer upper-order effective address bits to 
be translated into physical address bits (more low-order address bits (at least 17) are 
untranslated to form the offset into a block) for block address translation. Also, instead of 
segment descriptors and a page table, block address translations use the on-chip BAT 
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registers as a BAT array. If an effective address matches the corresponding field of a BAT 
register, the information in the BAT register is used to generate the physical address; in this 
case, the results of the page translation (occurring in parallel) are ignored. Note that a 
matching BAT array entry takes precedence over a translation provided by the segment 
descriptor in all cases (even if the segment is a direct-store segment). 
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Direct-store address translation is used when the optional direct-store translation control bit 
(T bit) in the corresponding segment descriptor is set. In this case, the remaining 
information in the segment descriptor is interpreted as identifier information that is used 
with the remaining effective address bits to generate the protocol used in a direct-store 
interface access on the external interface; additionally, no TLB lookup or page table search 
is performed. Note that this facility is not likely to be supported in future processors. 


When the processor generates an access, and the corresponding address translation enable 
bit in MSR is cleared, the resulting physical address is identical to the effective address and 
all other translation mechanisms are ignored. Instruction and data address translation is 
enabled by setting the MSR[IR] and MSR[DR] bits, respectively. See Section 7.2.6.1, 
“Real Addressing Mode and Block Address Translation Selection,” for more information. 


7.2.4 Memory Protection Facilities 


In addition to the translation of effective addresses to physical addresses, the MMU 
provides access protection of supervisor areas from user access and can designate areas of 
memory as read-only as well as no-execute. Table 7-3 shows the eight protection options 
supported by the MMU for pages. 


Table 7-3. Access Protection Options for Pages 


| User Read | Read Supervisor Read ‘ 
User Supervisor 


a ors a ce 
a 2 es ee ee ee ee 
Sencar an i ee ee a 
cee ea Fe a a a | 








Supervisor wrte-only-no-erecut ESE a eS Ee ee ee 
Both usrlsupervisor FW a a 


Both (user/supervisor)-no-execute 


Both (user/supervisor) read-only 


\ Access permitted 
— Protection violation 
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The no-execute option provided in the segment descriptor lets the operating system 
program whether or not instructions can be fetched from an area of memory. The remaining 
options are enforced based on a combination of information in the segment descriptor and 
the page table entry. Thus, the supervisor-only option allows only read and write operations 
generated while the processor is operating in supervisor mode (MSR[PR] = 0) to access the 
page. User accesses that map into a supervisor-only page cause an exception. 


Note that independently of the protection mechanisms, care must be taken when writing to 
instruction areas as coherency must be maintained with on-chip copies of instructions that 
may have been prefetched into a queue or an instruction cache. Refer to Section 5.1.5.2, 
“Instruction Cache Instructions,” for more information on coherency within instruction 
areas. 


As shown in the table, the supervisor-write-only option allows both user and supervisor 
accesses to read from the page, but only supervisor programs can write to that area. There 
is also an option that allows both supervisor and user programs read and write access (both 
user/supervisor option), and finally, there is an option to designate a page as read-only, both 
for user and supervisor programs (both read-only option). 


For areas of memory that are translated by the block address translation mechanism, the 
protection options are similar, except that blocks are translated by separate mechanisms for 
instruction and data, blocks do not have a no-execute option, and blocks can be designated 
as enabled for user and supervisor accesses independently. Therefore, a block can be 
designated as supervisor-only, for example, but this block can be programmed such that all 
user accesses simply ignore the block translation, rather than take an exception in the case 
of a match. This allows a flexible way for supervisor and user programs to use overlapping 
effective address space areas that map to unique physical address areas (without exceptions 
occurring). 


For direct-store segments, the MMU calculates a key bit based on the protection values 
programmed in the segment descriptor and the specific user/supervisor and read/write 
information for the particular access. However, this bit is merely passed on to the system 
interface to be transmitted in the context of the direct-store interface protocol. The MMU 
does not itself enforce any protection or cause any exception based on the state of the key 
bit for these accesses. The I/O controller device or other external hardware can optionally 
use this bit to enforce any protection required. Note that future devices are not likely to 
implement the direct-store facility. 


Finally, a facility in the VEA and OEA allows pages or blocks to be designated as guarded, 
preventing out-of-order accesses that may cause undesired side effects. For example, areas 
of the memory map used to control I/O devices can be marked as guarded so accesses do 
not occur unless they are explicitly required by the program. Refer to Section 5.2.1.5.3, 
“Out-of-Order Accesses to Guarded Memory,” for a complete description of how accesses 
to guarded memory are restricted. 
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7.2.5 Page History Information 


The MMUs of PowerPC processors also define referenced (R) and changed (C) bits in the 
page address translation mechanism that can be used as history information relevant to the 
page. The operating system can use these bits to determine which areas of memory to write 
back to disk when new pages must be allocated in main memory. While these bits are 
initially programmed by the operating system into the page table, the architecture specifies 
that the R and C bits are maintained by the processor and the processor updates these bits 
when required. 


7.2.6 General Flow of MMU Address Translation 


The following sections describe the general flow used by PowerPC processors to translate 
effective addresses to virtual and then physical addresses. Note that although there are 
references to the concept of an on-chip TLB, these entities may not be present in a particular 
hardware implementation for performance enhancement (and a particular implementation 
may have one or more TLBs). Thus, they are shown here as optional and only the software 
ramifications of the existence of a TLB are discussed. 


7.2.6.1 Real Addressing Mode and Block Address Translation 
Selection 

When an instruction or data access is generated and the corresponding instruction or data 

translation is disabled (MSR[IR] =0 or MSR[DR] = 0), real addressing mode translation is 

used (physical address equals effective address) and the access continues to the memory 

subsystem as described in Section 7.3, “Real Addressing Mode.” 


Figure 7-3 shows the flow the MMU uses in determining whether to select real addressing 
mode, block address translation, or the segment descriptor (to select either direct-store or 
page address translation). 
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Figure 7-3. General Flow of Address Translation (Real Addressing Mode and Block) 


Note that if the BAT array search results in a hit, the access is qualified with the appropriate 
protection bits. If the access is determined to be protected (not allowed), an exception (ISI 
or DSI exception) is generated. 


7.2.6.2 Page and Direct-Store Address Translation Selection 


If address translation is enabled (real addressing mode translation not selected) and the 
effective address information does not match a BAT array entry, the segment descriptor 
must be located. When the segment descriptor is located, the T bit in the segment descriptor 
selects whether the translation is to a page or to a direct-store segment as shown in 
Figure 7-4. In addition, Figure 7-4 also shows the way in which the no-execute protection 
is enforced; if the N bit in the segment descriptor is set and the access is an instruction fetch, 
the access is faulted. 
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Figure 7-4. General Flow of Page and Direct-Store Address Translation 
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For 32-bit implementations, the segment descriptor for an access is contained in one of 16 
on-chip segment registers; effective address bits EAO-EA3 select one of the 16 segment 
registers. 


7.2.6.2.1 Selection of Page Address Translation 

If SR[T] = 0, page address translation is selected. The information in the segment descriptor 
is then used to generate the 52-bit virtual address. The virtual address is then used to 
identify the page address translation information (stored as page table entries (PTEs) in a 
page table in memory). Once again, although the architecture does not require the existence 
of a TLB, one or more TLBs may be implemented in the hardware to store copies of 
recently-used PTEs on-chip for increased performance. 


If an access hits in the TLB, the page translation occurs and the physical address bits are 
forwarded to the memory subsystem. If the translation is not found in the TLB, the MMU 
requires a search of the page table. The hardware of some implementations may perform 
the table search automatically, while others may trap to an exception handler for the system 
software to perform the page table search. If the translation is found, a new TLB entry is 
created and the page translation is once again attempted. This time, the TLB is guaranteed 
to hit. When the PTE is located, the access is qualified with the appropriate protection bits. 
If the access is determined to be protected (not allowed), an exception (ISI or DSI 
exception) is generated. 


If the PTE is not found by the table search operation, an ISI or DSI exception is generated. 


7.2.6.2.2 Selection of Direct-Store Address Translation 

When the segment descriptor has the T bit set, the access is considered a direct-store access 
and the direct-store interface protocol of the external interface is used to perform the access. 
The selection of address translation type differs for instruction and data accesses only in 
that instruction accesses are not allowed from direct-store segments; attempting to fetch an 
instruction from a direct-store segment causes an ISI exception. 


Note that this facility is not optimized for performance, was present for compatibility with 
POWER devices, and is being removed from the architecture. Future devices are not likely 
to support it; software should not depend on its effects and new software should not use it. 
See Section 7.7, “Direct-Store Segment Address Translation,’ for more detailed 
information about the translation of addresses in direct-store segments in those processors 
that implement this. 
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7.2.7 MMU Exceptions Summary 


To complete any memory access, the effective address must be translated to a physical 
address. A translation exception condition occurs if this translation fails for one of the 
following reasons: 


¢ There is no valid entry in the page table for the page specified by the effective 
address (and segment descriptor) and there is no valid BAT translation. 


¢ There is no valid segment descriptor and there is no valid BAT translation. 


e An address translation is found but the access is not allowed by the memory 
protection mechanism. 


The translation exception conditions cause either the ISI or the DSI exception to be taken 
as shown in Table 7-4. The state saved by the processor for each of these exceptions 
contains information that identifies the address of the failing instruction. Refer to 
Chapter 6, “Exceptions,” for a more detailed description of exception processing, and the 
bit settings of SRR1 and DSISR when an exception occurs. 


Table 7-4. Translation Exception Conditions 


Page fault (no PTE found) No matching PTE found in page tables (and no || access: ISI exception 
matching BAT array entry) SRRi1[1] = 1 


D access: DSI exception 
DSISR[1] = 1 


Block protection violation Conditions described in Table 7-11 for block | access: ISI exception 
SRR1[4] = 1 


D access: DSI exception 
DSISR[4] = 1 


Page protection violation Conditions described in Table 7-18 for page | access: ISI exception 
SRR1[4] = 1 


D access: DSI exception 
DSISR[4] = 1 


No-execute protection violation Attempt to fetch instruction when SR[N] = 1 ISI exception 
SRR1[3] = 1 


Instruction fetch from direct-store | Attempt to fetch instruction when SR[T] = 1 ISI exception 
segment—note that the direct- SRR1[3] = 1 
store facility is optional and being 

removed from the architecture. 


Instruction fetch from guarded Attempt to fetch instruction when MSR[IR] = 1 | ISI exception 
memory and either: SRRi1[3] = 1 
matching xBAT[G] = 1, or 
no matching BAT entry and PTE[G] = 1 





In addition to the translation exceptions, there are other MMU-related conditions (some of 
them implementation-specific) that can cause an exception to occur. These conditions map 
to the exceptions as shown in Table 7-5. The only MMU exception conditions that occur 
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when MSR[DR] = 0 are those that cause the alignment exception for data accesses. For 
more detailed information about the conditions that cause the alignment exception (in 
particular for string/multiple instructions), see Section 6.4.6, “Alignment Exception 
(0x00600).” Refer to Chapter 6, “Exceptions,” for a complete description of the SRR1 and 
DSISR bit settings for these exceptions. 


Table 7-5. Other MMU Exception Conditions 


ee 


dcbz with W = 1 or | = 1 (may cause debz instruction to write-through | Alignment exception 
exception or operation may be or cache-inhibited segment or (implementation-dependent) 
performed to memory) block 


lwarx or stwex. with W = 1 (may Reservation instruction to write- DSI exception (implementation- 
cause exception or execute correctly) through segment or block dependent) 
DSISR[5] = 1 


lwarx, stwex., eciwx, or ecowx Reservation instruction or DSI exception (implementation- 
instruction to direct-store segment external control instruction when | dependent) 

(may cause exception or may produce | SR[T]=1 DSISR[5] = 1 
boundedly-undefined results)—note 

that the direct-store facility is optional 

and being removed from the 

architecture 


Floating-point load or store to direct- Floating-point memory access Alignment exception 
store segment (may cause exception when SR[T] = 1 (implementation-dependent) 


or instruction may execute 
correctly)—note that the direct-store 
facility is optional and being removed 
from the architecture 


Load or store operation that causes a Direct-store interface protocol DSI exception 
direct-store error—note that the direct- | signalled with an error condition DSISR[0] = 1 
store facility is optional and being 

removed from the architecture 


eciwx or ecowx attempted when eciwx or ecowx attempted with DSI exception 
external control facility disabled EAR[E] = 0 DSISR[11] = 1 


Imw, stmw, Iswi, Iswx, stswi, or Imw, stmw, Iswi, Iswx, stswi, or | Alignment exception 
stswx instruction attempted in little- stswx instruction attempted 
endian mode while MSR[LE] = 1 


Operand misalignment Translation enabled and operand | Alignment exception (some of these 
is misaligned as described in cases are implementation- 
Chapter 6, “Exceptions.” dependent) 
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7.2.8 MMU Instructions and Register Summary 


The MMU instructions and registers allow the operating system to set up the segment 
descriptors. Additionally, the operating system has the resources to set up the block address 
translation areas and the page tables in memory. 


Note that because the implementation of TLBs is optional, the instructions that refer to 
these structures are also optional. However, as these structures serve as caches of the page 
table, there must be a software protocol for maintaining coherency between these caches 
and the tables in memory whenever the tables in memory are modified. Therefore, the 
PowerPC OEA specifies that a processor implementing a TLB is guaranteed to have a 
means for doing the following: 


¢ Invalidating an individual TLB entry 
¢ Invalidating the entire TLB 


When the tables in memory are changed, the operating system purges these caches of the 
corresponding entries, allowing the translation caching mechanism to refetch from the 
tables when the corresponding entries are required. 


A processor may implement one or more of the instructions described in this section to 
support table invalidation. Alternatively, an algorithm may be specified that performs one 
of the functions listed above (a loop invalidating individual TLB entries may be used to 
invalidate the entire TLB, for example), or different instructions may be provided. 


A processor may also perform additional functions (not described here) as well as those 
described in the implementation of some of these instructions. For example, the tlbie 
instruction may be implemented so as to purge all TLB entries in a congruence class (that 
is, all TLB entries indexed by the specified EA which can include corresponding entries in 
data and instruction TLBs) or the entire TLB. 


Note that if a processor does not implement an optional instruction it treats the instruction 
as a no-op or as an illegal instruction, depending on the implementation. Also, note that the 
segment register and TLB concepts described here are conceptual; that is, a processor may 
implement parallel sets of segment registers (and even TLBs) for instructions and data. 


Because the MMU specification for PowerPC processors is so flexible, it is recommended 
that the software that uses these instructions and registers be encapsulated into subroutines 
to minimize the impact of migrating across the family of implementations. 


Table 7-6 summarizes the PowerPC instructions that specifically control the MMU. For 
more detailed information about the instructions, refer to Chapter 8, “Instruction Set.” 


Table 7-6. Instruction Summary—Control MMU 





Move to Segment Register mtsr SR,rS SR[SR]< rS 
32-bit implementations only 
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Table 7-6. Instruction Summary—Control MMU (Continued) 
Move to Segment Register mtsrinrS,rB | SR[rB[O—3]]<-rS 
Indirect 32-bit implementations only 
Move from Segment Register mfsr rD,SR rD<—SR[SR] 
32-bit implementations only 
Move from Segment Register rD<—SR[rB[0-3]] 
Indirect 32-bit implementations only 








Translation Lookaside Buffer For all TLB entries, TLB[V]<—0 

Invalidate All (optional) Causes invalidation of TLB entries only for processor that 
executed the tlbia 

Translation Lookaside Buffer tlbie rB If TLB hit (for effective address specified as rB), TLB[V]<—0 

Invalidate Entry (optional) Causes TLB invalidation of entry in all processors in system 


Translation Lookaside Buffer tlbsync Ensures that all tlbie instructions previously executed by the 
Synchronize (optional) processor executing the tlbsync instruction have completed on 


all processors 





Table 7-7 summarizes the registers that the operating system uses to program the MMU. 
These registers are accessible to supervisor-level software only (supervisor level is referred 
to as privileged state in the architecture specification). These registers are described in 
detail in Chapter 2, “PowerPC Register Set.” 


Table 7-7. MMU Registers 


Segment registers The sixteen 32-bit segment registers are present only in 32-bit implementations of the 

(SRO-SR15) PowerPC architecture. Figure 7-13 shows the format of a segment register. The fields in the 
segment register are interpreted differently depending on the value of bit 0. The segment 
registers are accessed by the mtsr, mtsrin, mfsr, and mfsrin instructions. 


BAT registers There are 16 BAT registers, organized as four pairs of instruction BAT registers 
(IBATOU-IBAT3U, (IBATOU-IBAT3U paired with IBATOL-IBAT3L) and four pairs of data BAT registers 
IBATOL-IBATS3L, (DBATOU—DBAT3U paired with DBATOL—DBAT3L). The BAT registers are defined as 32-bit 
DBATOU-DBAT3U, and | registers in 32-bit implementations. These are special-purpose registers that are accessed 
DBATOL—DBAT3L) by the mtspr and mfspr instructions. 


SDR1 register The SDR1 register specifies the base and size of the page tables in memory. SDR1 is 
defined as a 32-bit register for 32-bit implementations. This is a special-purpose register that 
is accessed by the mtspr and mfspr instructions. 





7.2.9 TLB Entry Invalidation 


Optionally, PowerPC processors implement TLB structures that store on-chip copies of the 
PTEs that are resident in physical memory. These processors have the ability to invalidate 
resident TLB entries through the use of the tlbie and tlbia instructions. Additionally, these 
instructions may also enable a TLB invalidate signalling mechanism in hardware so that 
other processors also invalidate their resident copies of the matching PTE. See Chapter 8, 
“Instruction Set,’ for detailed information about the tlbie and tlbia instructions. 
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7.3 Real Addressing Mode 


If address translation is disabled (MSR[IR] = 0 or MSR[DR] = 0) for a particular access, 
the effective address is treated as the physical address and is passed directly to the memory 
subsystem as a real addressing mode address translation. If an implementation has a smaller 
physical address range than effective address range, the extra high-order bits of the effective 
address may be ignored in the generation of the physical address. 


Section 2.3.17, “Synchronization Requirements for Special Registers and for Lookaside 
Buffers,’ describes the synchronization requirements for changes to MSR[IR] and 
MSR[DR]. 


The addresses for accesses that occur in real addressing mode bypass all memory protection 
checks as described in Section 7.4.4, “Block Memory Protection,” and Section 7.5.4, “Page 
Memory Protection” and do not cause the recording of referenced and changed information 
(described in Section 7.5.3, “Page History Recording’’). 


For data accesses that use real addressing mode, the memory access mode bits (WIMG) are 
assumed to be 0b0011. That is, the cache is write-back and memory does not need to be 
updated immediately (W = 0), caching is enabled (I = 0), data coherency is enforced with 
memory, I/O, and other processors (caches) (M = 1, so data is global), and the memory is 
guarded. For instruction accesses in real addressing mode, the memory access mode bits 
(WIMG) are assumed to be either 0b0001 or 0b0011. That is, caching is enabled (I = 0) and 
the memory is guarded. Additionally, coherency may or may not be enforced with memory, 
I/O, and other processors (caches) (M = 0 or 1, so data may or may not be considered 
global). For a complete description of the WIMG bits, refer to Section 5.2.1, 
“Memory/Cache Access Attributes.” 


Note that the attempted execution of the eciwx or ecowx instructions while MSR[DR] = 0 
causes boundedly-undefined results. 


Whenever an exception occurs, the processor clears both the MSR[IR] and MSR[DR] bits. 
Therefore, at least at the beginning of all exception handlers (including reset), the processor 
operates in real addressing mode for instruction and data accesses. If address translation is 
required for the exception handler code, the software must explicitly enable address 
translation by accessing the MSR as described in Chapter 2, “PowerPC Register Set.” 


Note that an attempt to access a physical address that is not physically present in the system 
may cause a machine check exception (or even a checkstop condition), depending on the 
response by the system for this case. Thus, care must be taken when generating addresses 
in real addressing mode. Note that this can also occur when translation is enabled and the 
SDR1 register sets up the translation such that nonexistent memory is accessed. See 
Section 6.4.2, “Machine Check Exception (0x00200),” for more information on machine 
check exceptions. 
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7.4 Block Address Translation 


The block address translation (BAT) mechanism in the OEA provides a way to map ranges 
of effective addresses larger than a single page into contiguous areas of physical memory. 
Such areas can be used for data that is not subject to normal virtual memory handling 
(paging), such as a memory-mapped display buffer or an extremely large array of numerical 
data. 


The following sections describe the implementation of block address translation in 
PowerPC processors, including the block protection mechanism, followed by a block 
translation summary with a detailed flow diagram. 


7.4.1 BAT Array Organization 


The block address translation mechanism in PowerPC processors is implemented as a 
software-controlled BAT array. The BAT array maintains the address translation 
information for eight blocks of memory. The BAT array in PowerPC processors is 
maintained by the system software and is implemented as a set of 16 special-purpose 
registers (SPRs). Each block is defined by a pair of SPRs called upper and lower BAT 
registers that contain the effective and physical addresses for the block. 


The BAT registers can be read from or written to by the mfspr and mtspr instructions; 
access to the BAT registers is privileged. Section 7.4.3, “BAT Register Implementation of 
BAT Array,” gives more information about the BAT registers. Note that the BAT array 
entries are completely ignored for TLB invalidate operations detected in hardware and in 
the execution of the tlbie or tlbia instruction. 


Figure 7-5 shows the organization of the BAT array. Four pairs of BAT registers are 
provided for translating instruction addresses and four pairs of BAT registers are used for 
translating data addresses. These eight pairs of BAT registers comprise two four-entry 
fully-associative BAT arrays (each BAT array entry corresponds to a pair of BAT registers). 
The BAT array is fully-associative in that any address can reside in any BAT. In addition, 
the effective address field of all four corresponding entries (instruction or data) is 
simultaneously compared with the effective address of the access to check for a match. 


7-20 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


Unmasked bits of EAO—EA14, MSR[PR] 












































Instruction Accesses BEPI, 
Vs, Vp 
Compare |< IBATOU SPR 528 
IBATOL 
Compare «4 ° 
e 
Compare ° 
e 
Compare IBAT3U 
IBAT3L SPR 535 














_______» BAT Array Hit/Miss 


Unmasked bits of EAOQ-EA14, MSR[PR] 









































Data Accesses BEPI, 
Vs, Vp 
Compare x DBATOU SPR 536 
DBATOL 
>| Compare «4 . 
e 
>| Compare . 
e 
i»{ Compare |< DBAT3U 
DBAT3L SPR 543 











» BAT Array Hit/Miss 





Figure 7-5. BAT Array Organization 


Each pair of BAT registers defines the starting address of a block in the effective address 
space, the size of the block, and the start of the corresponding block in physical address 
space. If an effective address is within the range defined by a pair of BAT registers, its 
physical address is defined as the starting physical address of the block plus the low-order 
effective address bits. 


Blocks are restricted to a finite set of sizes, from 128 Kbytes GM bytes) to 256 Mbytes QF 
bytes). The starting address of a block in both effective address space and physical address 
space is defined as a multiple of the block size. 


It is an error for system software to program the BAT registers such that an effective address 
is translated by more than one valid IBAT pair or more than one valid DBAT pair. If this 
occurs, the results are undefined and may include a spurious violation of the memory 
protection mechanism, a machine check exception, or a checkstop condition. 


The equation for determining whether a BAT entry is valid for a particular access is as 
follows: 


BAT_entry_valid = (Vs & aMSR[PR]) | (Vp & MSR[PR]) 
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If a BAT entry is not valid for a given access, it does not participate in address translation 
for that access. Two BAT entries may not map an overlapping effective address range and 
be valid at the same time. 


Entries that have complementary settings of V[s] and V[p] may map overlapping effective 
address blocks. Complementary settings would be as follows: 


BAT entry A: Vs = 1, Vp =0 
BAT entry B: Vs =0, Vp=1 


7.4.2 Recognition of Addresses in BAT Arrays 


The BAT arrays are accessed in parallel with segmented address translation to determine 
whether a particular effective address corresponds to a block defined by the BAT arrays. If 
an effective address is within a valid BAT area, the physical address for the memory access 
is determined as described in Section 7.4.5, “Block Physical Address Generation.” 


Block address translation is enabled only when address translation is enabled 
(MSR[IR] = 1 and/or MSR[DR] = 1). Also, a matching BAT array entry always takes 
precedence over any segment descriptor translation, independent of the setting of the 
SR[T] bit, and the segment descriptor information is completely ignored. 


Figure 7-6 shows the flow of the BAT array comparison used in block address translation. 
When an instruction fetch operation is required, the effective address is compared with the 
four instruction BAT array entries; similarly, the effective addresses of data accesses are 
compared with the four data BAT array entries. The BAT arrays are fully-associative in that 
any of the four instruction or data BAT array entries can contain a matching entry (for an 
instruction or data access, respectively). 


Note that Figure 7-6 assumes that the protection bits, BATL[PP], allow an access to occur. 
If not, an exception is generated, as described in Section 7.4.4, “Block Memory 
Protection.” 
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Figure 7-6. BAT Array Hit/Miss Flow 


Two BAT array entry fields are compared to determine if there is a BAT array hit—a block 
effective page index (BEPI) field, which is compared with the high-order effective address 
bits, and one of two valid bits (Vs or Vp), which is evaluated relative to the value of 
MSR[PR]. Note that the figure assumes a block size of 128 Kbytes (all bits of BEPI are used 
in the comparison); the actual number of bits of the BEPI field that are used are masked by 
the BL field (block length) as described in Section 7.4.3, “BAT Register Implementation of 
BAT Array.” 
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Thus, the specific criteria for determining a BAT array hit are as follows: 


¢ The upper-order 15 bits of the effective address, subject to a mask, must match the 
BEPI field of the BAT array entry. 


¢ The appropriate valid bit in the BAT array entry must set to one as follows: 


— MSR[PR] = 0 corresponds to supervisor mode; in this mode, Vs is checked. 
— MSR[PR] = 1 corresponds to user mode; in this mode, Vp is checked. 


The matching entry is then subject to the protection checking described in Section 7.4.4, 
“Block Memory Protection,” before it is used as the source for the physical address. Note 
that if a user mode program performs an access with an effective address that matches the 
BEPI field of a BAT area defined as valid only for supervisor accesses (Vp = 0 and Vs = 1) 
for example, the BAT mechanism does not generate a protection violation and the BAT 
entry is simply ignored. Thus, a supervisor program can use the block address translation 
mechanism to share a portion of the effective address space with a user program (that uses 
page address translation for this area). 


If a memory area is to be mapped by the BAT mechanism for both instruction and data 
accesses, the mapping must be set up in both an IBAT and DBAT entry; this is the case even 
on implementations that do not have separate instruction and data caches. 


Note that a block can be defined to overlay part of a segment such that the block portion is 
nonpaged although the rest of the segment can be paged. This allows nonpaged areas to be 
specified within a segment. Thus, if an area of memory is translated by an instruction BAT 
entry and data accesses are not also required to that same area of memory, PTEs are not 
required for that area of memory. Similarly, if an area of memory is translated by a data 
BAT entry, and instruction accesses are not also required to that same area of memory, PTEs 
are not required for that area of memory. 


7.4.3 BAT Register Implementation of BAT Array 


Recall that the BAT array is comprised of four entries used for instruction accesses and four 
entries used for data accesses. Each BAT array entry consists of a pair of BAT registers—an 
upper and a lower BAT register for each entry. The BAT registers are accessed with the 
mtspr and mfspr instructions and are only accessible to supervisor-level programs. See 
Appendix F, “Simplified Mnemonics,” for a list of simplified mnemonics for use with the 
BAT registers. (Note that simplified mnemonics are referred to as extended mnemonics in 
the architecture specification.) 


The format and bit definitions of the upper and lower BAT registers are shown in Figure 7-7 
and Figure 7-8, respectively. 
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[_] Reserved 








BEPI 





0 000 BL Vs| Vp 








14 15 18 19 29 30 31 


Figure 7-7. Format of Upper BAT Registers 


[_] Reserved 


BRPN 0 0000 0000 0 WIMG* fo] pp | 


0 


14 15 24 25 28 29 30 31 


*W and G bits are not defined for IBAT registers. Attempting to write to these bits causes boundedly-undefined results. 


Figure 7-8. Format of Lower BAT Registers 


The BAT registers contain the effective-to-physical address mappings for blocks of 
memory. This mapping information includes the effective address bits that are compared 
with the effective address of the access, the memory/cache access mode bits (WIMG), and 
the protection bits for the block. In addition, the size of the block and the starting address 
of the block are defined by the physical block number (BRPN) and block size mask (BL) 


fields. 


Table 7-10 describes the bits in the upper and lower BAT registers. Note that the W and G 
bits are defined for BAT registers that translate data accesses (DBAT registers); attempting 
to write to the W and G bits in IBAT registers causes boundedly-undefined results. 
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The BL field in the upper BAT register is a mask that encodes the size of the block. 
Table 7-8. BAT Registers—Field and Bit Descriptions 


Upper/Lower = 


Upper BAT BEPI Block effective page index. This field is compared with high-order bits of 

Register the logical address to determine if there is a hit in that BAT array entry. 
(Note that the architecture specification refers to logical address as 
effective address.) 


19-29 Block length. BL is a mask that encodes the size of the block. Values for 
this field are listed in Table 2-12. 


Supervisor mode valid bit. This bit interacts with MSR[PR] to determine if 
there is a match with the logical address. For more information, see 
Section 7.4.2, “Recognition of Addresses in BAT Arrays." 


User mode valid bit. This bit also interacts with MSR[PR] to determine if 
there is a match with the logical address. For more information, see 


Section 7.4.2, “Recognition of Addresses in BAT Arrays.” 


Lower BAT This field is used in conjunction with the BL field to generate high-order 
Register bits of the physical address of the block. 


25-28 WIMG Memory/cache access mode bits 
W_ Write-through 
| Caching-inhibited 
M_ Memory coherence 
G Guarded 
Attempting to write to the W and G bits in IBAT registers causes 
boundedly-undefined results. For detailed information about the WIMG 
bits, see Section 5.2.1, “Memory/Cache Access Attributes." 


a 
30-31 fame Protection bits for block. This field determines the protection for the block 


as described in Section 7.4.4, “Block Memory Protection." 





Table 7-9 defines the bit encodings for the BL field of the upper BAT register. 


Table 7-9. Upper BAT Register Block Size Mask Encodings 


Block Size BL Encoding 
128 Kbytes 000 0000 0000 
256 Kbytes 000 0000 0001 
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Table 7-9. Upper BAT Register Block Size Mask Encodings (Continued) 


Only the values shown in Table 7-9 are valid for BL. An effective address is determined to 
be within a BAT area if the appropriate bits (determined by the BL field) of the effective 
address match the value in the BEPI field of the upper BAT register, and if the appropriate 
valid bit (Vs or Vp) is set. Note that for an access to occur, the protection bits (PP bits) in 
the lower BAT register must be set appropriately, as described in Section 7.4.4, “Block 
Memory Protection.” 





The number of zeros in the BL field determines the bits of the effective address that are used 
in the comparison with the BEPI field to determine if there is a hit in that BAT array entry. 
The rightmost bit of the BL field is aligned with bit 14 of the effective address; bits of the 
effective address corresponding to ones in the BL field are then cleared to zero for the 
comparison. 


The value loaded into the BL field determines both the size of the block and the alignment 
of the block in both effective address space and physical address space. The values loaded 
into the BEPI and BRPN fields must have at least as many low-order zeros as there are ones 
in BL. Otherwise, the results are undefined. Also, if the processor does not support 32 bits 
of physical address, software should write zeros to those unsupported bits in the BRPN field 
(as the implementation treats them as reserved). Otherwise, a machine check exception can 
occur. 


7.4.4 Block Memory Protection 


After an effective address is determined to be within a block defined by the BAT array, the 
access is validated by the memory protection mechanism. If this protection mechanism 
prohibits the access, a block protection violation exception condition (DSI or ISI exception) 
is generated. 


The memory protection mechanism allows selectively granting read access, granting 
read/write access, and prohibiting access to areas of memory based on a number of control 
criteria. The block protection mechanism provides protection at the granularity defined by 
the block size (128 Kbyte to 256 Mbyte). 
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As the memory protection mechanism used by the block and page address translation is 
different, refer to Section 7.5.4, “Page Memory Protection,” for specific information unique 
to page address translation. 


For block address translation, the memory protection mechanism is controlled by the PP 
bits (which are located in the lower BAT register), which define the access options for the 
block. Table 7-10 shows the types of accesses that are allowed for the possible PP bit 
combinations. 


Table 7-10. Access Protection Control for Blocks 


| PP Accesses Allowed 


Read/write 





Thus, any access attempted (read or write) when PP = 00 results in a protection violation 
exception condition. When PP = x1, an attempt to perform a write access causes a 
protection violation exception condition, and when PP = 10, all accesses are allowed. When 
the memory protection mechanism prohibits a reference, one of the following occurs, 
depending on the type of access that was attempted: 


¢ For data accesses, a DSI exception is generated and bit 4 of DSISR is set. 
¢ For instruction accesses, an ISI exception is generated and SRR1 bit 4 is set. 


See Chapter 6, “Exceptions,” for more information about these exceptions. 


Table 7-11 shows a summary of the conditions that cause exceptions for supervisor and 
user read and write accesses within a BAT area. Each BAT array entry is programmed to be 
either used or ignored for supervisor and user accesses via the BAT array entry valid bits, 
and the PP bits enforce the read/write protection options. Note that the valid bits (Vs and 
Vp) are used as part of the match criteria for a BAT array entry and are not explicitly part 
of the protection mechanism. 
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Table 7-11. Access Protection Summary for BAT Array 


PP Supervisor | Supervisor 
hale ee oe Gell allel 


Co [am [ eee rayne [Raed [ nates [wero [nae 
ueaconrece 
Cea [a ecard e] 
ee 
[2 [eo [eso rearons [nates | ee | Espen | Beton | 
Te [a [nascar | ners [worse [1 | Benton | 
[2 To [narerceawie | vetuee [ose |v | vt 
[ee Tecra | Senin | Eeceton | Eronon | Brio | 


Note: The term ‘Not used’ implies that the access is not translated by the BAT array and is translated by the 
page address translation mechanism described in Section 7.5, “Memory Segment Model,” instead. 





Note that because access to the BAT registers is privileged, only supervisor programs can 
modify the protection and valid bits for the block. 
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Figure 7-9 expands on the actions taken by the processor in the case of a memory protection 
violation. Note that the debt and debtst instructions do not cause exceptions; in the case of 
a memory protection violation for the attempted execution of one of these instructions, the 
translation is aborted and the instruction executes as a no-op (no violation is reported). 
Refer to Chapter 6, “Exceptions,” for a complete description of the SRR1 and DSISR bit 
settings for the protection violation exceptions. 


Block Memory 
Protection Violation 


(From Figure 7-11) 











otherwise dcbt/debtst 
Pe Instruction 
Instruction Data Abort Access 
Access Access 
ye 
SRRI4}—1 | DSISR[4] — 1 























ISI Exception DSI Exception 


Figure 7-9. Memory Protection Violation Flow for Blocks 
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7.4.5 Block Physical Address Generation 


Access to the physical memory within the block is made according to the memory/cache 
access mode defined by the WIMG bits in the lower BAT register. These bits apply to the 
entire block rather than to an individual page as described in Section 5.2.1, 
“Memory/Cache Access Attributes.” 


034 1415 31 





Effective Address [ Bit 11 Bit | 17 Bit 














Block Size Mask 
































11 Bit | 17 Bit 
Physical Block Number 4 Bit 11 Bit | 
OR 
Oy 34 1415 31 
Physical Address 4 Bit 11 Bit 17 Bit 


Figure 7-10. Block Physical Address Generation 
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7.4.6 Block Address Translation Summary 


Figure 7-11 is an expansion of the ‘BAT Array Hit’ branch of Figure 7-3 and shows the 
translation of address bits for 32-bit implementations. Note that the figure does not show 
when many of the exceptions in Table 7-5 are detected or taken as this is implementation- 
specific. 


BAT Array Hit 








otherwise Read Access with 
PP = 00 
Write Access with eee 
PP = any of 

PAO-PA63 = BRPN (0-3) || 00 

BRPN (4-14) OR x1 

((EA4—-EA14) & (BL)) || 

EA15-EA31 








>O 


Continue Access to Memory Memory Protection 
Subsystem with WIMG in Lower- Violation Flow 


BAT Register 
(See Figure 7-9) 














Figure 7-11. Block Address Translation Flow 


7.5 Memory Segment Model 


Memory in the PowerPC OEA is divided into 256-Mbyte segments. This segmented 
memory model provides a way to map 4-Kbyte pages of effective addresses to 4-Kbyte 
pages in physical memory (page address translation), while providing the programming 
flexibility afforded by a large virtual address space (52 bits). 


A page address translation may be superseded by a matching block address translation as 
described in Section 7.4, “Block Address Translation.” If not, the page translation proceeds 
in the following two steps: 


1. from effective address to the virtual address (which never exists as a specific entity 
but can be considered to be the concatenation of the virtual page number and the byte 
offset within a page), and 


2. from virtual address to physical address. 


The page address translation mechanism is described in the following sections, followed by 
a summary of page address translation with a detailed flow diagram. 
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7.5.1 Recognition of Addresses in Segments 


The page address translation uses segment descriptors, which provide virtual address and 
protection information, and page table entries (PTEs), which provide the physical address 
and page protection information. The segment descriptors are programmed by the operating 
system to provide the virtual ID for a segment. In addition, the operating system also creates 
the page table in memory that provides the virtual-to-physical address mappings (in the 
form of PTEs) for the pages in memory. 


Segments in the OEA can be classified as one of the following two types: 


* Memory segment—An effective address in these segments represents a virtual 
address that is used to define the physical address of the page. 


¢ Direct-store segment—References made to direct-store segments do not use the 
virtual paging mechanism of the processor. Note that the direct-store facility is 
optional and being removed from the architecture. See Section 7.7, “Direct-Store 
Segment Address Translation,” for a complete description of the mapping of direct- 
store segments for those processors that implement it. 


The T bit in the segment descriptor selects between memory segments and direct-store 
segments, as shown in Table 7-12. 


Table 7-12. Segment Descriptor Types 


Segment Descriptor 


1 Direct-store segment—optional, but being removed from the 
architecture. Its use is discouraged. 


7.5.1.1 Selection of Memory Segments 


All accesses generated by the processor can be mapped to a segment descriptor; however, 
if translation is disabled (MSR[IR] = 0 or MSR[DR] = 0 for an instruction or data access, 
respectively), real addressing mode translation is performed as described in Section 7.3, 
“Real Addressing Mode.” Otherwise, if T = 0 in the corresponding segment descriptor (and 
the address is not translated by the BAT mechanism), the access maps to memory space and 
page address translation is performed. 





After a memory segment is selected, the processor creates the virtual address for the 
segment and searches for the PTE that dictates the physical page number to be used for the 
access. Note that I/O devices can be easily mapped into memory space and used as 
memory-mapped I/O. 
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7.5.1.2 Selection of Direct-Store Segments 

As described for memory segments, all accesses generated by the processor (with 
translation enabled) map to a segment descriptor. If T = 1 for the selected segment 
descriptor, the access maps to the direct-store interface space and the access proceeds as 
described in Section 7.7, ““Direct-Store Segment Address Translation.” Because the direct- 
store interface is present only for compatibility with existing I/O devices that used this 
interface and because the direct-store interface protocol is not optimized for performance, 
its use is discouraged. Additionally, future devices are not likely to support it. Thus, 
software should not depend on its results and new software should not use it. The most 
efficient method for accessing I/O is by mapping the I/O areas to memory segments. 


7.5.2 Page Address Translation Overview 


The translation of effective addresses to physical addresses is shown in Figure 7-12. The 
address translation is as follows: 


¢ Bits 0-3 of the effective address comprise the segment register number used to select 
a segment descriptor, from which the virtual segment ID (VSID) is extracted. 


¢ Bits 4-19 of the effective address correspond to the page number within the 
segment; these are concatenated with the VSID from the segment descriptor to form 
the virtual page number (VPN). The VPN is used to search for the PTE in either an 
on-chip TLB or the page table. The PTE then provides the physical page number 
(RPN). 

¢ Bits 20-31 of the effective address are the byte offset within the page; these are 
concatenated with the RPN field of a PTE to form the physical address used to 
access memory. 
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Figure 7-12. Page Address Translation Overview 


7.5.2.1 Segment Descriptor Definitions 


The fields in the segment descriptors are interpreted differently depending on the value of 


the T bit within the descriptor. When T = 1, the segment 


descriptor defines a direct-store 


segment, and the format is as described in Section 7.7.1, “Segment Descriptors for Direct- 


Store Segments.” 


7.5.2.1.1 Segment Descriptor Format 


The segment descriptors are 32 bits long and reside in one of 16 on-chip segment registers. 
Figure 7-13 shows the format of a segment register used in page address translation (T = 0). 


CSE z= 


0 1 2 3 4 78 


[_] Reserved 





31 


Figure 7-13. Segment Register Format for Page Address Translation 
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Table 7-13 provides the corresponding bit definitions of the segment register. 


Table 7-13. Segment Register Bit Definition for Page Address Translation 


ices ies) sper 
for = r= =| T = 0 selects this format 
Supervisor-state protection key 


a 





The Ks and Kp bits partially define the access protection for the pages within the segment. 
The page protection provided in the PowerPC OEA is described in Section 7.5.4, “Page 
Memory Protection.” The virtual segment ID field is used as the high-order bits of the 
virtual page number (VPN) as shown in Figure 7-12. 


The segment registers are programmed with specific instructions that reference the segment 
registers. However, since the segment registers described here are merely a conceptual 
model, a processor may implement separate segment registers for instructions and for data, 
for example. In this case, it is the responsibility of the hardware to maintain the consistency 
between the multiple sets of segment registers. 


The segment register instructions are summarized in Table 7-6. These instructions are 
privileged in that they are executable only while operating in supervisor mode. See 
Section 2.3.17, “Synchronization Requirements for Special Registers and for Lookaside 
Buffers,” for information about the synchronization requirements when modifying the 
segment registers. See Chapter 8, “Instruction Set,” for more detail on the encodings of 
these instructions. 
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7.5.2.2 Page Table Entry (PTE) Definitions 

Page table entries (PTEs) are generated and placed in page table in memory by the 
operating system using the hashing algorithm described in Section 7.6.1.3, “Page Table 
Hashing Functions.” The PowerPC OEA defines PTEs that are 64 bits in length. Some of 
the fields are defined as follows: 


¢ The virtual segment ID field corresponds to the high-order bits of the virtual page 
number (VPN), and, along with the H, V, and API fields, it is used to locate the PTE 
(used as match criteria in comparing the PTE with the segment information). 


¢ The R and C bits maintain history information for the page as described in 
Section 7.5.3, “Page History Recording.” 


¢ The WIMG bits define the memory/cache control mode for accesses to the page. 


¢ The PP bits define the remaining access protection constraints for the page. The 
page protection provided by PowerPC processors is described in Section 7.5.4, 
“Page Memory Protection.” 


Conceptually, the page table in memory must be searched to translate the address of every 
reference. For performance reasons, however, some processors use on-chip TLBs to cache 
copies of recently-used PTEs so that the table search time is eliminated for most accesses. 
In this case, the TLB is searched for the address translation first. If a copy of the PTE is 
found, then no page table search is performed. As TLBs are noncoherent caches of PTEs, 
software that changes the page table in any way must perform the appropriate TLB 
invalidate operations to keep the on-chip TLBs coherent with respect to the page table in 
memory. 


7.5.2.2.1. PTE Format 
Figure 7-14 shows the format of the two words that comprise a PTE for 32-bit 
implementations. 











[_] Reserved 
0 1 24 25 26 31 
V VSID H API 
RPN 000 RIC WIMG 0} PP 
0 19 20 22 23 24 25 28 29 3031 


Figure 7-14. Page Table Entry Format 


Table 7-14 lists the corresponding bit definitions for each word in a PTE as defined above. 
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Table 7-14. PTE Bit Definitions 


[vee [ee [vem [omen | 
Kia aa Entry valid (V = 1) or invalid (V = 0) 
ae CC 

Par | Rivovateaase dor 
ren [Presper 
[= [reset 
ra [Roweneoe 
a 
= [rece 
[er [Paseo 


In this case, the PTE contains an abbreviated page index rather than the complete page 
index field because at least ten of the low-order bits of the page index are used in the hash 
function to select a PTEG address (PTEG addresses define the location of a PTE). 
Therefore, these ten low-order bits are not repeated in the PTEs of that PTEG. 





7.5.3 Page History Recording 


Referenced (R) and changed (C) bits in each PTE keep history information about the page. 
The operating system then uses this information to determine which areas of memory to 
write back to disk when new pages must be allocated in main memory. Referenced and 
changed recording is performed only for accesses made with page address translation and 
not for translations made with the BAT mechanism or for accesses that correspond to direct- 
store (T = 1) segments. Furthermore, R and C bits are maintained only for accesses made 
while address translation is enabled (MSR[IR] = 1 or MSR[DR] = 1). 
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In general, the referenced and changed bits are updated to reflect the status of the page 
based on the access, as shown in Table 7-15. 


Table 7-15. Table Search Operations to Update History Bits 
Read: Table search operation to update R 
Write: Table search operation to update R and C 


Combination doesn’t occur 
10 Read: No special action 
Write: Table search operation to update C 
No special action for read or write 


In processors that implement a TLB, the processor may perform the R and C bit updates 
based on the copies of these bits resident in the TLB. For example, the processor may 
update the C bit based only on the status of the C bit in the TLB entry in the case of a TLB 
hit (the R bit may be assumed to be set in the page tables if there is a TLB hit). Therefore, 
when software clears the R and C bits in the page tables in memory, it must invalidate the 
TLB entries associated with the pages whose referenced and changed bits were cleared. See 
Section 7.6.3, “Page Table Updates,” for all of the constraints imposed on the software 
when updating the referenced and changed bits in the page tables. 





The R bit for a page may be set by the execution of the debt or debtst instruction to that 
page. However, neither of these instructions cause the C bit to be set. 


The update of the referenced and changed bits is performed by PowerPC processors as if 
address translation were disabled (real addressing mode address). 


7.5.3.1 Referenced Bit 

The referenced bit for each virtual page is located in the PTE. Every time a page is 
referenced (by an instruction fetch, or any other read or write access) the referenced bit is 
set in the page table. The referenced bit may be set immediately, or the setting may be 
delayed until the memory access is determined to be successful. Because the reference to a 
page is what causes a PTE to be loaded into the TLB, some processors may assume the R 
bit in the TLB is always set. The processor never automatically clears the referenced bit. 


The referenced bit is only a hint to the operating system about the activity of a page. At 
times, the referenced bit may be set although the access was not logically required by the 
program or even if the access was prevented by memory protection. Examples of this 
include the following: 

¢ Fetching of instructions not subsequently executed 

¢ Accesses generated by an Iswx or stswx instruction with a zero length 

* Accesses generated by an stwex. instruction when no store is performed 

¢ Accesses that cause exceptions and are not completed 
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7.5.3.2 Changed Bit 

The changed bit for each virtual page is located both in the PTE in the page table and in the 
copy of the PTE loaded into the TLB (if a TLB is implemented). Whenever a data store 
instruction is executed successfully, if the TLB search (for page address translation) results 
in a hit, the changed bit in the matching TLB entry is checked. If it is already set, it is not 
updated. If the TLB changed bit is 0, it is set and a table search operation is performed to 
set the C bit in the corresponding PTE in the page table. 


Processors cause the changed bit (in both the PTE in the page tables and in the TLB if 
implemented) to be set only when a store operation is allowed by the page memory 
protection mechanism and the store is guaranteed to be in the execution path, unless an 
exception, other than those caused by one of the following occurs: 


¢ System-caused interrupts (system reset, machine check, external, and decrementer 
interrupts) 


¢ Floating-point enabled exception type program exceptions when the processor is in 
an imprecise mode 


¢ Floating-point assist exceptions for instructions that cause no other kind of precise 
exception 


Furthermore, the following conditions may cause the C bit to be set: 


¢ The execution of an stwex. instruction is allowed by the memory protection 
mechanism but a store operation is not performed. 


¢ The execution of an stswx instruction is allowed by the memory protection 
mechanism but a store operation is not performed because the specified length is 
zero. 


¢ A dceba or debi instruction is executed. 


No other cases cause the C bit to be set. 


7.5.3.3 Scenarios for Referenced and Changed Bit Recording 

This section provides a summary of the model (defined by the OEA) used by PowerPC 
processors that maintain the referenced and changed bits automatically in hardware, in the 
setting of the R and C bits. In some scenarios, the bits are guaranteed to be set by the 
processor; in some scenarios, the architecture allows that the bits may be set (not absolutely 
required); and in some scenarios, the bits are guaranteed to not be set. Note that when the 
hardware updates the R and C bits in memory, the accesses are performed as a physical 
memory access, as if the WIMG bit settings were 0b0010 (that is, as unguarded cacheable 
operations in which coherency is required). 


In implementations that do not maintain the R and C bits in hardware, software assistance 
is required. For these processors, the information in this section still applies, except that the 
software performing the updates is constrained to the rules described (that is, must set bits 
shown as guaranteed to be set and must not set bits shown as guaranteed to not be set). Note 
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that this software should be contained in the area of memory reserved for implementation- 
specific use and should be invisible to the operating system. 


Table 7-16 defines a prioritized list of the R and C bit settings for all scenarios. The entries 
in the table are prioritized from top to bottom, such that a matching scenario occurring 
closer to the top of the table takes precedence over a matching scenario closer to the bottom 
of the table. For example, if an stwex. instruction causes a protection violation and there is 
no reservation, the C bit is not altered, as shown for the protection violation case. Note that 
in the table, load operations include those generated by load instructions, by the eciwx 
instruction, and by the cache management instructions that are treated as loads with respect 
to address translation. Similarly, store operations include those operations generated by 
store instructions, by the ecowx instruction, and by the cache management instructions that 
are treated as stores with respect to address translation. 


Table 7-16. Model for Guaranteed R and C Bit Settings 


Priorit Causes Setting Causes Setting 
¥ of R Bit of C Bit 
| No-execute protection violation = execute | No-execute protection violation = violation 


Out-of-order instruction fetch or load operation [Maybe = [No 


Out-of-order store operation for instructions that will Maybe! Maybe! 
cause no other kind of precise exception (in the 

absence of system-caused, imprecise, or floating-point 

assist exceptions) 


[a [atoterasteveracweaios [wanes 
[~s[eactrmeetews —————d ae e 
[7 [ears ow ae 
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Er 
Og 
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Notes: 
| If Cis set, R is guaranteed to also be set. 
2 This includes the case in which the instruction was fetched out of order and R was not set. 
3 For a deba instruction that does not modify the target block, it is possible that neither bit is set. 
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7.5.3.4 Synchronization of Memory Accesses and Referenced and 
Changed Bit Updates 

Although the processor updates the referenced and changed bits in the page tables 

automatically, these updates are not guaranteed to be immediately visible to the program 

after the load, store, or instruction fetch operation that caused the update. If processor A 

executes a load or store or fetches an instruction, the following conditions are met with 

respect to performing the access and performing any R and C bit updates: 


¢ If processor A subsequently executes a sync instruction, both the updates to the bits 
in the page table and the load or store operation are guaranteed to be performed with 
respect to all processors and mechanisms before the syne instruction completes on 
processor A. 


¢ Additionally, if processor B executes a tlbie instruction that 
— signals the invalidation to the hardware, 
— invalidates the TLB entry for the access in processor A, and 


— is detected by processor A after processor A has begun the access, 


and processor B executes a tlbsync instruction after it executes the tlbie, both the 
updates to the bits and the original access are guaranteed to be performed with 
respect to all processors and mechanisms before the tlbsync instruction completes 
on processor A. 


7.5.4 Page Memory Protection 

In addition to the no-execute option that can be programmed at the segment descriptor level 
to prevent instructions from being fetched from a given segment (shown in Figure 7-4), 
there are a number of other memory protection options that can be programmed at the page 
level. The page memory protection mechanism allows selectively granting read access, 
granting read/write access, and prohibiting access to areas of memory based on a number 
of control criteria. 


The memory protection used by the block and page address translation mechanisms is 
different in that the page address translation protection defines a key bit that, in conjunction 
with the PP bits, determines whether supervisor and user programs can access a page. For 
specific information about block address translation, refer to Section 7.4.4, “Block 
Memory Protection.” 


For page address translation, the memory protection mechanism is controlled by the 
following: 
¢ MSR[PR], which defines the mode of the access as follows: 


— MSR[PR] = 0 corresponds to supervisor mode 
— MSR[PR] = | corresponds to user mode 


¢ Ks and Kp, the supervisor and user key bits, which define the key for the page 
¢ The PP bits, which define the access options for the page 
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The key bits (Ks and Kp) and the PP bits are located as follows for page address translation: 
¢ Ks and Kp are located in the segment descriptor. 
¢ The PP bits are located in the PTE. 


The key bits, the PP bits, and the MSR[PR] bit are used as follows: 
e« When an access is generated, one of the key bits is selected to be the key as follows: 
— For supervisor accesses (MSR[PR] = 0), the Ks bit is used and Kp is ignored 
— For user accesses (MSR[PR] = 1), the Kp bit is used and Ks is ignored 
That is, key = (Kp & MSR[PR]) | (Ks & =MSR[PR]) 
¢ The selected key is used with the PP bits to determine if instruction fetching, load 


access, or store access is allowed. 


Table 7-17 shows the types of accesses that are allowed for the general case (all possible 
Ks, Kp, and PP bit combinations), assuming that the N bit in the segment descriptor is 
cleared (the no-execute option is not selected). 


Table 7-17. Access Protection Control with Key 


Cer | | mete 
ce 
es 
oe 
es 
oe 


Notes: 


1 Ks or Kp selected by state of MSR[PR] 
2 PP protection option bits in PTE 





Thus, the conditions that cause a protection violation (not including the no-execute 
protection option for instruction fetches) are depicted in Table 7-18 and as a flow diagram 
in Figure 7-17. Any access attempted (read or write) when the key = 1 and PP = 00, causes 
a protection violation exception condition. When key = 1 and PP = O01, an attempt to 
perform a write access causes a protection violation exception condition. When PP = 10, all 
accesses are allowed, and when PP = 11, write accesses always cause an exception. The 
processor takes either the ISI or the DSI exception (for an instruction or data access, 
respectively) when there is an attempt to violate the memory protection. 
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Table 7-18. Exception Conditions for Key and PP Combinations 


Ke Prohibited 
¥ Accesses 
ee 


Any combination of the Ks, Kp, and PP bits is allowed. One example is if the Ks and Kp 
bits are programmed so that the value of the key bit for Table 7-17 directly matches the 


MSR[PR] bit for the access. In this case, the encoding of Ks = 0 and Kp = 1 is used for the 
PTE, and the PP bits then enforce the protection options shown in Table 7-19. 





Table 7-19. Access Protection Encoding of PP Bits for Ks = 0 and Kp = 1 


PP User Read | User Write sacl sea 
Field (Key = 1) (Key = 1) (Key = 0) (Key = 0) 





However, if the setting Ks = | is used, supervisor accesses are treated as user reads and 
writes with respect to Table 7-19. Likewise, if the setting Kp = 0 is used, user accesses to 
the page are treated as supervisor accesses in relation to Table 7-19. Therefore, by 
modifying one of the key bits (in the segment descriptor), the way the processor interprets 
accesses (supervisor or user) in a particular segment can easily be changed. Note, however, 
that only supervisor programs are allowed to modify the key bits for the segment descriptor. 
Access to the segment registers is privileged. 
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When the memory protection mechanism prohibits a reference, the flow of events is similar 
to that for a memory protection violation occurring with the block protection mechanism. 
As shown in Figure 7-15, one of the following occurs depending on the type of access that 


was attempted: 


¢ For data accesses, a DSI exception is generated and DSISR[4] is set. If the access is 


a store, DSISR[6] is also set. 


¢ For instruction accesses, 


— an ISI exception is generated and SRR1[4] is set, or 


— an ISI exception is generated and SRR1[3] is set if the segment is designated as 


no-execute. 


The only difference between the flow shown in Figure 7-15 and that of the block memory 
protection violation is the ISI exception that can be caused by an attempt to fetch an 
instruction from a segment that has been designated as no-execute (N bit set in the segment 
descriptor). See Chapter 6, “Exceptions,” for more information about these exceptions. 


Instruction 
Access 


N Bit Set in eee 


Segment Descriptor 


SRR1[3] < 1 otherwise 


SRRI1[4] <1 





Ly 


( ISI Exception ) 





Page Memory 
Protection Violation 





otherwise dcbt/debtst 
ea Instruction 
Bele Abort Access 
Access 






DSISR[4] < 1 





DSI Exception 





Figure 7-15. Memory Protection Violation Flow for Pages 
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If the page protection mechanism prohibits a store operation, the changed bit is not set (in 
either the TLB or in the page tables in memory); however, a prohibited store access may 
cause a PTE to be loaded into the TLB and consequently cause the referenced bit to be set 
in a PTE (both in the TLB and in the page table in memory). 


7.5.5 Page Address Translation Summary 


Figure 7-16 provides the detailed flow for the page address translation mechanism. The 
figure includes the checking of the N bit in the segment descriptor and then expands on the 
“TLB Hit’ branch of Figure 7-4. The detailed flow for the ‘TLB Miss’ branch of Figure 7-4 
is described in Section 7.6.2, “Page Table Search Operation.” The checking of memory 
protection violation conditions for page address translation is shown in Figure 7-17. The 
‘Invalidate TLB Entry’ box shown in Figure 7-16 is marked as implementation-specific as 
this level of detail for TLBs (and the existence of TLBs) is not dictated by the architecture. 
Note that the figure does not show the detection of all exception conditions shown in 
Table 7-4 and Table 7-5; the flow for many of these exceptions is implementation-specific. 
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Figure 7-16. Page Address Translation Flow—TLB Hit 
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Check Page Memory 
Protection Violation 
Conditions 


Select Key: 
If MSR[PR] = 0, key = Ks 
If MSR[PR] = 1, key = Kp 
























Write Access with 
key || PP = any of: 
011 


otherwise 








Read Access 


Access Permitted with oH PP = 


(See Figure 7-15) 





Access Prohibited 


Figure 7-17. Page Memory Protection Violation Conditions for Page Address 
Translation 


7.6 Hashed Page Tables 


If a copy of the PTE corresponding to the VPN for an access is not resident in a TLB 
(corresponding to a miss in the TLB, provided a TLB is implemented), the processor must 
search for the PTE in the page tables set up by the operating system in main memory. 


The algorithm specified by the architecture for accessing the page tables includes a hashing 
function on some of the virtual address bits. Thus, the addresses for PTEs are allocated 
more evenly within the page tables and the hit rate of the page tables is maximized. This 
algorithm must be synthesized by the operating system for it to correctly place the page 
table entries in main memory. 


If page table search operations are performed automatically by the hardware, they are 
performed using physical addresses and as if the memory access attribute bit M = 1 
(memory coherency enforced in hardware). If the software performs the page table search 
operations, the accesses must be performed in real addressing mode (MSR[DR] = 0); this 
additionally guarantees that M = 1. 


This section describes the format of the page tables and the algorithm used to access them. 
In addition, the constraints imposed on the software in updating the page tables (and other 
MMU resources) are described. 
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7.6.1 Page Table Definition 


The hashed page table is a variable-sized data structure that defines the mapping between 
virtual page numbers and physical page numbers. The page table size is a power of 2, its 
starting address is a multiple of its size, and the table must reside in memory with the 
WIMG attributes of 0b0010. 


The page table contains a number of page table entry groups (PTEGs). For 32-bit 
implementations, a PTEG contains eight PTEs of eight bytes each; therefore, each PTEG 
is 64 bytes long. PTEG addresses are entry points for table search operations. Figure 7-18 
shows two PTEG addresses (PTEGaddrl and PTEGaddr2) where a given PTE may reside. 


Page Table 





PTEGO 





PTEGadar1 


PTEGaddr2 


PTEGn 











Figure 7-18. Page Table Definitions 


A given PTE can reside in one of two possible PTEGS—one is the primary PTEG and the 
other is the secondary PTEG. Additionally, a given PTE can reside in any of the PTE 
locations within an addressed PTEG. Thus, a given PTE may reside in one of 16 possible 
locations within the page table. If a given PTE is not in either the primary or secondary 
PTEG, a page table miss occurs, corresponding to a page fault condition. 
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A table search operation is defined as the search for a PTE within a primary and secondary 
PTEG. When a table search operation commences, a primary hashing function is performed 
on the virtual address. The output of the hashing function is then concatenated with bits 
programmed into the SDR1 register by the operating system to create the physical address 
of the primary PTEG. The PTEs in the PTEG are then checked, one by one, to see if there 
is a hit within the PTEG. If the PTE is not located, a secondary hashing function is 
performed, a new physical address is generated for the PTEG, and the PTE is searched for 
again, using the secondary PTEG address. 


Note, however, that although a given PTE may reside in one of 16 possible locations, an 
address that is a primary PTEG address for some accesses also functions as a secondary 
PTEG address for a second set of accesses (as defined by the secondary hashing function). 
Therefore, these 16 possible locations are really shared by two different sets of effective 
addresses. Section 7.6.1.6, “Page Table Structure Examples,” illustrates how PTEs map 
into the 16 possible locations as primary and secondary PTEs. 


7.6.1.1 SDR1 Register Definitions 

The SDR1 register contains the control information for the page table structure in that it 
defines the high-order bits for the physical base address of the page table and it defines the 
size of the table. Note that there are certain synchronization requirements for writing to 
SDR1 that are described in Section 2.3.17, “Synchronization Requirements for Special 
Registers and for Lookaside Buffers.” The format of the SDR1 register is shown in the 
following sections. 


Figure 7-19 shows the format of the SDR1 register. 





[_] Reserved 
HTABORG 0000 000 HTABMASK 
0 15 16 22 23 31 


Figure 7-19. SDR1 Register Format 
Bit settings are described in Table 7-20. 


Table 7-20. SDR1 Register Bit Settings 


a 
HTABORG Physical base address of page table 


23-31 HTABMASK Mask for page table address 
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The HTABORG field in SDR1 contains the high-order 16 bits of the 32-bit physical address 
of the page table. Therefore, the beginning of the page table lies on a 2!6 byte (64 Kbyte) 
boundary at a minimum. If the processor does not support 32 bits of physical address, 
software should write zeros to those unsupported bits in the HTABORG field (as the 
implementation treats them as reserved). Otherwise, a machine check exception can occur. 


A page table can be any size pa bytes where 16 <n < 25. The HTABMASK field in SDR1 
contains a mask value that determines how many bits from the output of the hashing 
function are used as the page table index. This mask must be of the form Ob00...011...1 (a 
string of 0 bits followed by a string of 1 bits). As the table size increases, more bits are used 
from the output of the hashing function to index into the table. The | bits in HTABMASK 
determine how many additional bits (beyond the minimum of 10) from the hash are used in 
the index; the HTABORG field must have the same number of low-order bits equal to 0 as 
the HTABMASK field has low-order bits equal to 1. 


Example: 


Suppose that the page table is 16,384 Oras) 128-byte PTEGs, for a total size of 27! bytes 
(2 Mbytes). A 14-bit index is required. Eleven bits are provided from the hash to start with, 
so 3 additional bits from the hash must be selected. Thus the value in HTABMASK must 
be 3 and the value in HTABORG must have its low-order 3 bits (SDR1[31—33]) equal to 0. 
This means that the page table must begin ona 2 <?*+!!+7>=2?! =2-Mbyte boundary. 


7.6.1.2 Page Table Size 


The number of entries in the page table directly affects performance because it influences 
the hit ratio in the page table and thus the rate of page fault exception conditions. If the table 
is too small, not all virtual pages that have physical page frames assigned may be mapped 
via the page table. This can happen if more than 16 entries map to the same 
primary/secondary pair of PTEGs; in this case, many hash collisions may occur. 


In a 32-bit implementation, the minimum size for a page table is 64 Kbytes (2!° PTEGs of 
64 bytes each). However, it is recommended that the total number of PTEGs in the page 
table be at least half the number of physical page frames to be mapped. While avoidance of 
hash collisions cannot be guaranteed for any size page table, making the page table larger 
than the recommended minimum size reduces the frequency of such collisions by making 
the primary PTEGs more sparsely populated, and further reducing the need to use the 
secondary PTEGs. 


Table 7-21 shows some example sizes for total main memory in a 32-bit system. The 
recommended minimum page table size for these example memory sizes are then outlined, 
along with their corresponding HTABORG and HTABMASK settings in SDR1. Note that 
systems with less than 8 Mbytes of main memory may be designed with 32-bit processors, 
but the minimum amount of memory that can be used for the page tables in these cases is 
64 Kbytes. 
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Table 7-21. Minimum Recommended Page Table Sizes 


Recommended Minimum Semings for Recommended 
Minimum 
Tera) Main: Memory Number of HTABORG 
past A Page Mapped ve (Maskable | HTABMASK 
Pages (PTEs) Bits 7-15) 


4 Gbytes (29°) 32 Mbytes (225) 





0 0000 0000 111111111 


As an example, if the physical memory size is rad bytes (512 Mbyte), then there are 


929 _ 912 317 


(4 Kbyte page size) = (128 Kbyte) total page frames. If this number of page 
frames is divided by 2, the resultant minimum recommended page table size is 2!6 PTEGs, 
or 2°7 bytes (4 Mbytes) of memory for the page tables. 


7.6.1.3 Page Table Hashing Functions 

The MMU uses two different hashing functions, a primary and a secondary, in the creation 
of the physical addresses used in a page table search operation. These hashing functions 
distribute the PTEs within the page table, in that there are two possible PTEGs where a 
given PTE can reside. Additionally, there are eight possible PTE locations within a PTEG 
where a given PTE can reside. If a PTE is not found using the primary hashing function, 
the secondary hashing function is performed, and the secondary PTEG is searched. Note 
that these two functions must also be used by the operating system to set up the page tables 
in memory appropriately. 


Typically, the hashing functions provide a high probability that a required PTE is resident 
in the page table, without requiring the definition of all possible PTEs in main memory. 
However, if a PTE is not found in the secondary PTEG, a page fault occurs and an exception 
is taken. Thus, the required PTE can then be placed into either the primary or secondary 
PTEG by the system software, and on the next TLB miss to this page (in those processors 
that implement a TLB), the PTE will be found in the page tables (and loaded into an on- 
chip TLB). 
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The address of a PTEG is derived from the HTABORG field of the SDR1 register, and the 
output of the corresponding hashing function (primary hashing function for primary PTEG 
and secondary hashing function for a secondary PTEG). The value in the HTABMASK 
field determines how many of the high-order hash value bits are masked and how many are 
used in the generation of the physical address of the PTEG. 


Figure 7-20 depicts the hashing functions defined by the PowerPC OEA for 32-bit 
implementations. The inputs to the primary hashing function are the low-order 19 bits of 
the VSID field of the selected segment register (bits 5—23 of the 52-bit virtual address), and 
the page index field of the effective address (bits 24-39 of the virtual address) concatenated 
with three zero high-order bits. The XOR of these two values generates the output of the 
primary hashing function (hash value 1). 


When the secondary hashing function is required, the output of the primary hashing 
function is complemented with one’s complement arithmetic, to provide hash value 2. 
Primary Hash: 
VA5 VA23 


low-Order 19 Bits of VSID (from Segment Register) 











XOR 
24 39 
000 Page Index (from Effective Address) 
Output of Hashing Function 1 Hash Value 1 
0 8 9 18 


Secondary Hash: 
0 18 


Hash Value 1 


One’s Complement Function 





| Output of Hashing Function 2 | Hash Value 2 





0 8 9 18 
Se) 


Figure 7-20. Hashing Functions for Page Tables 
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7.6.1.4 Page Table Addresses 


The following sections illustrate the generation of the addresses used for accessing the 
hashed page tables. As stated earlier, the operating system must synthesize the table search 
algorithm for setting up the tables. 


Two of the elements that define the virtual address (the VSID field of the segment descriptor 
and the page index field of the effective address) are used as inputs into a hashing function. 
Depending on whether the primary or secondary PTEG is to be accessed, the processor uses 
either the primary or secondary hashing function as described in Section 7.6.1.3, “Page 
Table Hashing Functions.” 


Note that unless all accesses to be performed by the processor can be translated by the BAT 
mechanism when address translation is enabled (MSR[DR] or MSR[IR] = 1), the SDR1 
must point to a valid page table. Otherwise, a machine check exception can occur. 


Additionally, care should be given that page table addresses not conflict with those that 
correspond to areas of the physical address map reserved for the exception vector table or 
other implementation-specific purposes (refer to Section 7.2.1.2, “Predefined Physical 
Memory Locations’). 


For 32-bit implementations, the base address of the page table is defined by the high-order 
bits of SDRI[HTABORG]. 


Effectively, bits 7-15 of the PTEG address are derived from the masking of the high-order 
bits of the hash value (as defined by SDRI[HTABMASK]) concatenated with 
(implemented as an OR function) the high-order bits of SDR1[HTABORG] as defined by 
HTABMASK. Bits 16—25 of the PTEG address are the 10 low-order bits of the hash value, 
and bits 26-31 of the PTEG address are zero. In the process of searching for a PTE, the 
processor checks up to eight PTEs located in the primary PTEG and up to eight PTEs 
located in the secondary PTEG, if required, searching for a match. Figure 7-21 provides a 
graphical description of the generation of the PTEG addresses for 32-bit implementations. 
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Figure 7-21. Generation of Addresses for Page Tables 
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7.6.1.5 Page Table Structure Summary 

In the process of searching for a PTE, the processor interprets the values read from memory 
as described in Section 7.5.2.2, “Page Table Entry (PTE) Definitions.” The VSID and the 
abbreviated page index (API) fields of the virtual address of the access are compared to 
those same fields of the PTEs in memory. In addition, the valid (V) bit and the hashing 
function (H) bit are also checked. For a hit to occur, the V bit of the PTE in memory must 
be set. If the fields match and the entry is valid, the PTE is considered a hit if the H bit is 
set as follows: 


¢ If this is the primary PTEG, H =0 
¢ If this is the secondary PTEG, H = 1 


The physical address of the PTE(s) to be checked is derived as shown in Figure 7-31 and 
Figure 7-21, and the generated address is the address of a group of eight PTEs (a PTEG). 
During a table search operation, the processor compares up to 16 PTEs: PTEO-PTE7 of the 
primary PTEG (defined by the primary hashing function) and PTEO-PTE7 of the secondary 
PTEG (defined by the secondary hashing function). 


If the VSID and API fields do not match (or if V or H are not set appropriately) for any of 
these PTEs, a page fault occurs and an exception is taken. Thus, if a valid PTE is located in 
the page tables, the page is considered resident; if no matching (and valid) PTE is found for 
an access, the page in question is interpreted as nonresident (page fault) and the operating 
system must load the page into main memory and update the PTE accordingly. 


The architecture does not specify the order in which the PTEs are checked. Note that for 
maximum performance however, PTEs should be allocated by the operating system first 
beginning with the PTEO location within the primary PTEG, then PTE], and so on. If more 
than eight PTEs are required within the address space that defines a PTEG address, the 
secondary PTEG can be used (again, allocation of PTEO of the secondary PTEG first, and 
so on is recommended). Additionally, it may be desirable to place the PTEs that will require 
most frequent access at the beginning of a PTEG and reserve the PTEs in the secondary 
PTEG for the least frequently accessed PTEs. 


The architecture also allows for multiple matching entries to be found within a table search 
operation. Multiple matching PTEs are allowed if they meet the match criteria described 
above, as well as have identical RPN, WIMG, and PP values, allowing for differences in the 
R and C bits. In this case, one of the matching PTEs is used and the R and C bits are updated 
according to this PTE. In the case that multiple PTEs are found that meet the match criteria 
but differ in the RPN, WIMG or PP fields, the translation is undefined and the resultant R 
and C bits in the matching entries are also undefined. 


Note that multiple matching entries can also differ in the setting of the H bit, but the H bit 
must be set according to whether the PTE was located in the primary or secondary PTEG, 
as described above. 
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7.6.1.6 Page Table Structure Examples 

Figure 7-22 shows the structure of an example page table. The base address of the page 
table is defined by SDR1[HTABORG] concatenated with 16 zero bits. In this example, the 
address is identified by bits 0-13 in SDRI[HTABORG]; note that bits 14 and 15 of 
HTABORG must be zero because the low-order two bits of HTABMASK are ones. The 
addresses for individual PTEGs within this page table are then defined by bits 14-25 as an 
offset from bits 0-13 of this base address. Thus, the size of the page table is defined as 4096 
PTEGs. 


HTABORG HTABMASK 
Example: 15 23 31 





I 1 CT t—‘“C—sSOT 
Given: SDR1 |1010 0110 0000 0000 0000 0000 0000 0011 








Base Address 























Page Table 
$A600 0000 PTEGO 
PTEGaddr1 
PTEGaddr2 
PTEG4095 








0 14 25 31 





I 1 
PTEGaddr1 = 1010 0110 0000 O0mm_=aaaa aaaa aa00 0000 
0 14 25 31 


re 
PTEGaddr2 = 1010 0110 0000 O00nn_ bbbb bbbb bb00 0000 


Figure 7-22. Example Page Table Structure 
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Two example PTEG addresses are shown in the figure as PTEGaddr1 and PTEGaddr2. Bits 
14—25 of each PTEG address in this example page table are derived from the output of the 
hashing function (bits 26-31 are zero to start with PTEO of the PTEG). In this example, the 
‘b’ bits in PTEGaddr2 are the one’s complement of the ‘a’ bits in PTEGaddr1. The ‘n’ bits 
are also the one’s complement of the ‘m’ bits, but these two bits are generated from bits 7-8 
of the output of the hashing function, logically ORed with bits 14-15 of the HTABORG 
field (which must be zero). If bits 14-25 of PTEGaddr1 were derived by using the primary 
hashing function, then PTEGaddr2 corresponds to the secondary PTEG. 


Note, however, that bits 14-25 in PTEGaddr2 can also be derived from a combination of 
effective address bits, segment register bits, and the primary hashing function. In this case, 
then PTEGaddr1 corresponds to the secondary PTEG. Thus, while a PTEG may be 
considered a primary PTEG for some effective addresses (and segment register bits), it may 
also correspond to the secondary PTEG for a different effective address (and segment 
register value). 


It is the value of the H bit in each of the individual PTEs that identifies a particular PTE as 
either primary or secondary (there may be PTEs that correspond to a primary PTEG and 
PTEs that correspond to a secondary PTEG, all within the same physical PTEG address 
space). Thus, only the PTEs that have H = 0 are checked for a hit during a primary PTEG 
search. Likewise, only PTEs with H = 1 are checked in the case of a secondary PTEG 
search. 


7.6.1.7 PTEG Address Mapping Examples 


This section contains two examples of an effective address and how its address translation 
(the PTE) maps into the primary PTEG in physical memory. The examples illustrate how 
the processor generates PTEG addresses for a table search operation; this is also the 
algorithm that must be used by the operating system in creating page tables. 


Figure 7-23 shows an example of PTEG address generation for a 32-bit implementation. In 
the example, the value in SDR1 defines a page table at address OxOF98_0000 that contains 
8192 PTEGs. The example effective address selects segment register 0 (SRO) with the 
highest order four bits. The contents of SRO are then used along with bits 4-31 of the 
effective address to create the 52-bit virtual address. 


To generate the address of the primary PTEG, bits 5—23, and bits 24-39 of the virtual 
address are then used as inputs into the primary hashing function (XOR) to generate hash 
value 1. The low-order 13 bits of hash value | are then concatenated with the high-order 16 
bits of HTABORG and with six low-order 0 bits, defining the address of the primary PTEG 
(OxOF9F_F980). 
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Example: 


HTABORG 





HTABMASK 
23 31 


CT —“—‘i 





Given: SDR1 ooo0 1111. 1001 1000 0000 0000 0000 0111 






































0 4 19 20 31 
EA = | 0000 ,, 0000 1111-1111 1010 I 0000 8 0001 1011 | 
Segment Register Select | Byte Offset 
OxC A 7 0 1 Cc 
SRO | 0010 0000 1100 1010 0111 0000 0001 1100 
8 31 
Virtual Address: VSID Page Index 
1100 1010 0111 0000 0001 1100 O000 1111 1111 1010 0000 0001 1011 
1 
5 | 23 24 39 
Primary Hash: | 010 0111 0000 0001 1100 
XOR 
000 0000 1111 1111 1010 
HashValue1 [010 O1ff 1411 1170 0170 
9-bits 10-bits 








Primary PTEG Address: 


HTABORG 12 25 Start at PTEO 


0000 1111 1001 14 4111. 1001 1000 0000 
x0 F 9 F F 9 8 0’ 


Figure 7-23. Example Primary PTEG Address Generation 


Figure 7-24 shows the generation of the secondary PTEG address for this example. If the 
secondary PTEG is required, the secondary hash function is performed and the low-order 
13 bits of hash value 2 are then ORed with the high-order 16 bits of HTABORG (bits 13-15 
should be zero), and concatenated with six low-order 0 bits, defining the address of the 
secondary PTEG (Ox0F98_0640). 


As described in Figure 7-21, the 10 low-order bits of the page index field are always used 
in the generation of a PTEG address (through the hashing function) for a 32-bit 
implementation. This is why only the abbreviated page index (API) is defined for a PTE 
(the entire page index field does not need to be checked). For a given effective address, the 
low-order 10 bits of the page index (at least) contribute to the PTEG address (both primary 
and secondary) where the corresponding PTE may reside in memory. Therefore, if the high- 
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order 6 bits (the API field as defined for 32-bit implementations) of the page index match 
with the API field of a PTE within the specified PTEG, the PTE mapping is guaranteed to 
be the unique PTE required. 


Hash Value 1: 010 0111 #1111 1110 0110 





Secondary Hash: 010 0111 #1111 +1110 0110 


One’s oe 


Hash Value 2: 01 1000 09 0001 1001 
aed 
9 Bits 10 Bits 














Secondary PTEG Address: 








HTABORG 13 25 Start at PTEO 
rd 
0000 1111 1001 ee 0000 0110 0100 0000 
0x0 4 0 
Ox0F98_0000 PTEGO 


1) First compare 8 PTEs 
at OxOF9F_F980 Ox0F98 0640 PTEO | eco | PTE7| PTEG25 

















2) Then compare 8 PTEs 
at 0x0F98_0640, | 
if necessary 











OxOF9F_F980|PTEO eco PTE7| PTEG8166 





PTEG8191 





Figure 7-24. Example Secondary PTEG Address Generation 


Note that a given PTEG address does not map back to a unique effective address. Not only 
can a given PTEG be considered both a primary and a secondary PTEG (as described in 
Section 7.6.1.6, “Page Table Structure Examples’’), but in this example, bits 24—26 of the 
page index field of the virtual address are not used to generate the PTEG address. Therefore, 
any of the eight combinations of these bits will map to the same primary PTEG address. 
(However, these bits are part of the API and are therefore compared for each PTE within 
the PTEG to determine if there is a hit.) Furthermore, an effective address can select a 
different segment register with a different value such that the output of the primary (or 
secondary) hashing function happens to equal the hash values shown in the example. Thus, 
these effective addresses would also map to the same PTEG addresses shown. 
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7.6.2 Page Table Search Operation 


An outline of the page table search process performed by a 32-bit implementation is as 
follows: 


1. 


The 32-bit physical addresses of the primary and secondary PTEGs are generated as 
described in Section 7.6.1.4.2, “Page Table Address Generation for 32-Bit 
Implementations.” 


. As many as 16 PTEs (from the primary and secondary PTEGs) are read from 


memory (the architecture does not specify the order of these reads, allowing 
multiple reads to occur in parallel). PTE reads occur with an implied WIM 
memory/cache mode control bit setting of 0b001. Therefore, they are considered 
cacheable. 


. The PTEs in the selected PTEGs are tested for a match with the virtual page number 


(VPN) of the access. The VPN is the VSID concatenated with the page index field 
of the virtual address. For a match to occur, the following must be true: 

— PTE[H] = 0 for primary PTEG; PTE[H] = 1 for secondary PTEG 

— PTE[V] = 1 

— PTE[VSID] = VA[0-23] 

— PTE[API] = VA[24—29] 

If a match is not found within the eight PTEs of the primary PTEG and the eight 


PTEs of the secondary PTEG, an exception is generated as described in step 8. If a 
match (or multiple matches) is found, the table search process continues. 


. If multiple matches are found, all of the following must be true: 


— PTE[RPN] is equal for all matching entries 
— PTE[WIMG] is equal for all matching entries 
— PTE[PP] is equal for all matching entries 


. If one of the fields in step 5 does not match, the translation is undefined, and R and 


C bit of matching entries are undefined. Otherwise, the R and C bits are updated 
based on one of the matching entries. 


. Acopy of the PTE is written into the on-chip TLB (if implemented) and the R bit is 


updated in the PTE in memory (if necessary). If there is no memory protection 
violation, the C bit is also updated in memory (if necessary) and the table search is 
complete. 


. Ifa match is not found within the primary or secondary PTEG, the search fails, and 


a page fault exception condition occurs (either an ISI or DSI exception). 


Reads from memory for page table search operations are performed (that is, as unguarded 
cacheable operations in which coherency is required). 
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7.6.2.1 Flow for Page Table Search Operation 

Figure 7-25 provides a detailed flow diagram of a page table search operation. Note that the 
references to TLBs are shown as optional because TLBs are not required; if they do exist, 
the specifics of how they are maintained are implementation-specific. Also, Figure 7-25 
shows only a few cases of R-bit and C-bit updates. For a complete list of the R- and C-bit 
updates dictated by the architecture, refer to Table 7-16. 
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Page Table Search 








Generate Primary and 
Secondary PTEG Addresses 











Adjust PA to read Fetch PTE(s) 
more PTE(s) from Physical Address(es) 
pe [VSID, API, V] = Seg Desc [VSID], EA[API], 1 
otherwise PTE [H] = 0 (Primary PTEG) or 


PTE [H] = 1 (Secondary PTEG) 
otherwise 
All 16 PTEs checked 


otherwise PTE(RPN, WIMG, PP) 
equal for all matching PTEs 
Translation 
Page Fault Undefined Update PTE[R] 
R, C bits for (if required) 
matching PTEs 
also undefined 
Write PTE 
Instruction Access Data Access into TLB 


a 
SRR1[1] <1 DSISR[1] < 1 
DSI Exception Access Access 
Permitted Prohibited 


( ISI Exception ) 
we Page Memory 
: ; Protection Violatio’ 
otherwise Store operation - 
with PTE[C] = 0 (See Figure 7-15) 
Page Table 
Search Complete 
PTE[C] < 1 


™ 
Notes: (update PTE[C] in memory) 
















Check Memory Protection 


Violation Conditions 
(See Figure 7-17) 














TLBIPTE[C]] < 1 
J 











Implementation-specific 
P P Page Table 
—— Search Complete 


Figure 7-25. Page Table Search Flow 


Chapter 7. Memory Management 7-63 


7.6.3 Page Table Updates 


This section describes the requirements on the software when updating page tables in 
memory via some pseudocode examples. Multiprocessor systems must follow the rules 
described in this section so that all processors operate with a consistent set of page tables. 
Even single processor systems must follow certain rules, because software changes must be 
synchronized with the other instructions in execution and with automatic updates that may 
be made by the hardware (referenced and changed bit updates). Updates to the tables 
include the following operations: 


e« Adding a PTE 
¢ Modifying a PTE, including modifying the R and C bits of a PTE 
¢ Deleting a PTE 


PTEs must be locked on multiprocessor systems. Access to PTEs must be appropriately 
synchronized by software locking of (that is, guaranteeing exclusive access to) PTEs or 
PTEGs if more than one processor can modify the table at that time. In the examples below, 
software locks should be performed to provide exclusive access to the PTE being updated. 
However, the architecture does not dictate the specific protocol to be used for locking (for 
example, a single lock, a lock per PTEG, or a lock per PTE can be used). See Appendix E, 
“Synchronization Programming Examples,” for more information about the use of the 
reservation instructions (such as the Iwarx and stwex. instructions) to perform software 
locking. 


When TLBs are implemented they are defined as noncoherent caches of the page tables. 
TLB entries must be invalidated explicitly with the TLB invalidate entry instruction (tlbie) 
whenever the corresponding PTE is modified. In a multiprocessor system, the tlbie 
instruction must be controlled by software locking, so that the tlbie is issued on only one 
processor at a time. 


The PowerPC OEA defines the tlbsyne instruction that ensures that TLB invalidate 
operations executed by this processor have caused all appropriate actions in other 
processors. In a system that contains multiple processors, the tlbsync functionality must be 
used in order to ensure proper synchronization with the other PowerPC processors. Note 
that a syne instruction must also follow the tlbsyne to ensure that the tlbsyne has 
completed execution on this processor. 


On single processor systems, PTEs need not be locked and the eieio instructions (in 
between the tlbie and tlbsync instructions) and the tlbsync instructions themselves are not 
required. The sync instructions shown are required even for single processor systems (to 
ensure that all previous changes to the page tables and all preceding tlbie instructions have 
completed). 


Any processor, including the processor modifying the page table, may access the page table 
at any time in an attempt to reload a TLB entry. An inconsistent PTE must never 
accidentally become visible (if V = 1); thus, there must be synchronization between 
modifications to the valid bit and any other modifications (to avoid corrupted data). 
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In the pseudocode examples that follow, changes made to a PTE shown as a single line in 
the example is assumed to be performed with an atomic store instruction. Appropriate 
modifications must be made to these examples if this assumption is not satisfied. 


Updates of R and C bits by the processor are not synchronized with the accesses that cause 
the updates. When modifying the low-order half of a PTE, software must take care to avoid 
overwriting a processor update of these bits and to avoid having the value written by a store 
instruction overwritten by a processor update. The processor does not alter any other fields 
of the PTE. 


Explicitly altering certain MSR bits (using the mtmsr instruction), or explicitly altering 
PTEs or certain system registers, may have the side effect of changing the effective or 
physical addresses from which the current instruction stream is being fetched. This kind of 
side effect is defined as an implicit branch. Therefore, PTEs must not be changed in a 
manner that causes an implicit branch. Section 2.3.17, “Synchronization Requirements for 
Special Registers and for Lookaside Buffers,” lists the possible implicit branch conditions 
that can occur when system registers and MSR bits are changed. 


For a complete list of the synchronization requirements for executing the MMU 
instructions, see Section 2.3.17, “Synchronization Requirements for Special Registers and 
for Lookaside Buffers.” 


The following examples show the required sequence of operations. However, other 
instructions may be interleaved within the sequences shown. 


7.6.3.1 Adding a Page Table Entry 


Adding a page table entry requires only a lock on the PTE in a multiprocessor system. The 
first bytes in the PTE are then written (this example assumes the old valid bit was cleared), 
the eieio instruction orders the update, and then the second update can be made. A syne 
instruction ensures that the updates have been made to memory. 


lock(PTE) 

PTE[RPN,R,C,WIMG,PP] < new values 

eieio /* order 1st PTE update befor 2nd 
PTE[VSID,H,API,V] < new values (V = 1) 
sync /* ensure updates completed 
unlock(PTE) 
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7.6.3.2 Modifying a Page Table Entry 


This section describes several scenarios for modifying a PTE. 


7.6.3.2.1 General Case 

Consider the general case where a currently-valid PTE must be changed. To do this, the 
PTE must be locked, marked invalid, updated, invalidated from the TLB, marked valid 
again, and unlocked. The sync instruction must be used at appropriate times to wait for 
modifications to complete. 


Note that the tlbsynce and the sync instruction that follows it are only required if software 
consistency must be maintained with other PowerPC processors in a multiprocessor system 
(and the software is to be used in a multiprocessor environment). 


lock(PTE) 

PTE[V] < 0 /* (other fields don’t matter) 

sync /* ensure update completed 

PTE[RPN,R,C,WIMG,PP] < new values 

tlbie(old_EA) /*invalidate old translation 

eieio /* order before tlbsync and order 2nd PTE update before 3rd 
PTE[VSID,H,API, V] < new values (V = 1) 

tlbsync /* ensure tlbie completed on all processors 

sync /* ensure tlbsync and last update completed 

unlock(PTE) 


7.6.3.2.2 Clearing the Referenced (R) Bit 
When the PTE is modified only to clear the R bit to 0, a much simpler algorithm suffices 
because the R bit need not be maintained exactly. 


lock(PTE) 

oldR < PTE[R] /*get old R 

if oldR = 1, then 
PTE[R] < 0 /* store byte (R = 0, other bits unchanged) 
tlbie(PTE) /* invalidate entry 


eieio /* order tlbie before tlbsync 

tlbsyne /* ensure tlbie completed on all processors 

sync /* ensure tlbsync and update completed 
unlock(PTE) 


Since only the R and C bits are modified by the processor, and since they reside in different 
bytes, the R bit can be cleared by reading the current contents of the byte in the PTE 
containing R (bits 16-23 of the second word), ANDing the value with OxFE, and storing 
the byte back into the PTE. 
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7.6.3.2.3 Modifying the Virtual Address 


If the virtual address is being changed to a different address within the same hash class 
(primary or secondary), the following flow suffices: 


lock(PTE) 
PTE[VSID,API,H,V] < new values (V = 1) 
sync /* ensure update completed 
tlbie(old_EA) /* invalidate old translation 
eieio /* order tlbie before tlbsync 
tlbsync /* ensure tlbie completed on all processors 
sync /* ensure tlbsynce completed 
unlock(PTE) 


In this pseudocode flow, the tlbsyne and the sync instruction that follows it are only 
required if consistency must be maintained with other PowerPC processors in a 
multiprocessor system (and the software is to be used in a multiprocessor environment). 


In this example, if the new address is not a cache synonym (alias) of the old address, care 
must be taken to also flush (or invalidate) from an on-chip cache any cache synonyms for 
the page. Thus, a temporary virtual address that is a cache synonym with the page whose 
PTE is being modified can be assigned and then used for the cache flushing (or 
invalidation). 


To modify the WIMG or PP bits without overwriting an R or C bit update being performed 
by the processor, a sequence similar to the one shown above can be used, except that the 
second line is replaced by a loop containing an Iwarx/stwex. instruction pair that emulates 
an atomic compare and swap of the low-order word of the PTE. 


7.6.3.3 Deleting a Page Table Entry 


In this example, the entry is locked, marked invalid, invalidated in the TLB, and unlocked. 


Again, note that the tlbsyne and the sync instruction that follows it are only required if 
consistency must be maintained with other PowerPC processors in a multiprocessor system 
(and the software is to be used in a multiprocessor environment). 


lock(PTE) 

PTE[V] < 0 /* (other fields don’t matter) 

sync /* ensure update completed 

tlbie(old_EA) /* invalidate old translation 

eieio /* order tlbie before tlbsync 

tlbsynec /* ensure tlbie completed on all processors 
sync /* ensure tlbsyne completed 

unlock(PTE) 
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7.6.4 Segment Register Updates 


Synchronization requirements for using the move to segment register instructions are 
described in Section 2.3.17, “Synchronization Requirements for Special Registers and for 
Lookaside Buffers.” 


7.7 Direct-Store Segment Address Translation 


As described for memory segments, all accesses generated by the processor (with 
translation enabled) that do not map to a BAT area, map to a segment descriptor. If T = 1 
for the selected segment descriptor, the access maps to the direct-store interface, invoking 
a specific bus protocol for accessing I/O devices. 


Direct-store segments are provided for POWER compatibility. As the direct-store interface 
is present only for compatibility with existing I/O devices that used this interface and the 
direct-store interface protocol is not optimized for performance, its use is discouraged. This 
functionality is considered optional (to allow for those earlier devices that implemented it). 
However, future devices are not likely to support it. Thus, software should not depend on 
its results and new software should not use it. Applications that require low-latency 
load/store access to external address space should use memory-mapped I/O, rather than the 
direct-store interface. 


7.7.1 Segment Descriptors for Direct-Store Segments 


The format of many of the fields in the segment descriptors depends on the value of the 
T bit. In 32-bit implementations, the segment descriptors reside in one of 16 on-chip 
segment registers. Figure 7-26 shows the register format for the segment registers when the 
T bit is set. 


T | Ks| Kp BUID CNTLR_SPEC 





0 1 2 3 11 12 31 
Figure 7-26. Segment Register Format for Direct-Store Segments 


Table 7-22 shows the bit definitions for the segment registers when the T bit is set for 32-bit 
implementations. 


Table 7-22. Segment Register Bit Definitions for Direct-Store Segments 


ee 
Supervisor-state protection key 


BUID Bus unit ID 
12-31 Device-specific data for I/O controller 





7-68 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


7.7.2 Direct-Store Segment Accesses 


When the address translation process determines that the segment descriptor has T = 1, 
direct-store segment address translation is selected; no reference is made to the page tables 
and neither the referenced or changed bits are updated. These accesses are performed as if 
the WIMG bits were 0b0101; that is, caching is inhibited, the accesses bypass the cache, 
hardware-enforced coherency is not required, and the accesses are considered guarded. 


The specific protocol invoked to perform these accesses involves the transfer of address and 
data information; however, the PowerPC OEA does not define the exact hardware protocol 
used for direct-store accesses. Some instructions may cause multiple address/data 
transactions to occur on the bus. In this case, the address for each transaction is handled 
individually with respect to the MMU. 


The following describes the data that is typically sent to the memory controller by 
processors that implement the direct-store function: 


¢ One of the Kx bits (Ks or Kp) is selected to be the key as follows: 
— For supervisor accesses (MSR[PR] = 0), the Ks bit is used and Kp is ignored. 
— For user accesses (MSR[PR] = 1), the Kp bit is used and Ks is ignored. 

¢ An implementation-dependent portion of the segment descriptor. 


¢ An implementation-dependent portion of the effective address. 


7.7.3 Direct-Store Segment Protection 


Page-level memory protection as described in Section 7.5.4, “Page Memory Protection,” is 
not provided for direct-store segments. The appropriate key bit (Ks or Kp) from the segment 
descriptor is sent to the memory controller, and the memory controller implements any 
protection required. Frequently, no such mechanism is provided; the fact that a direct-store 
segment is mapped into the address space of a process may be regarded as sufficient 
authority to access the segment. 


7.7.4 Instructions Not Supported in Direct-Store Segments 

The following instructions are not supported at all and cause either a DSI exception or 
boundedly-undefined results when issued with an effective address that selects a segment 
descriptor that has T = 1: 


¢ lwarx 
¢  stwex. 
*  eciwx 
* ecowx 
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7.7.5 Instructions with No Effect in Direct-Store Segments 


The following instructions are executed as no-ops when issued with an effective address 
that selects a segment where T = 1: 


¢ dcba 
¢ dcbt 

¢ dcbtst 
¢« dcbf 

¢ debi 

¢ dcbst 
¢ dcbz 

¢ icbi 


7.7.6 Direct-Store Segment Translation Summary Flow 


Figure 7-27 shows the flow used by the MMU when direct-store segment address 
translation is selected. This figure expands the Direct-Store Segment Translation stub found 
in Figure 7-4 for both instruction and data accesses. In the case of a floating-point load or 
store operation to a direct-store segment, it is implementation-specific whether the 
alignment exception occurs. In the case of an eciwx, ecowx, Iwarx, or stwex. instruction, 
the implementation either sets the DSISR as shown and causes the DSI exception, or causes 
boundedly-undefined results. 
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Figure 7-27. Direct-Store Segment Translation Flow 
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Chapter 8 
Instruction Set 


This chapter lists the PowerPC instruction set in alphabetical order by mnemonic. Note that 
each entry includes the instruction formats and a quick reference ‘legend’ that provides 
such information as the level(s) of the PowerPC architecture in which the instruction may 
be found—user instruction set architecture (UISA), virtual environment architecture 
(VEA), and operating environment architecture (OEA); and the privilege level of the 
instruction—user- or supervisor-level (an instruction is assumed to be user-level unless the 
legend specifies that it is supervisor-level); and the instruction formats. The format 
diagrams show, horizontally, all valid combinations of instruction fields; for a graphical 
representation of these instruction formats, see Appendix A, “PowerPC Instruction Set 
Listings.” The legend also indicates if the instruction is 32-bit, and/or optional. A 
description of the instruction fields and pseudocode conventions are also provided. For 
more information on the PowerPC instruction set, refer to Chapter 4, “Addressing Modes 
and Instruction Set Summary.” 


Note that the architecture specification refers to user-level and supervisor-level as problem 
state and privileged state, respectively. 


8.1 Instruction Formats 


Instructions are four bytes long and word-aligned, so when instruction addresses are 
presented to the processor (as in branch instructions) the two low-order bits are ignored. 
Similarly, whenever the processor develops an instruction address, its two low-order bits 
are zero. 


Bits 0-5 always specify the primary opcode. Many instructions also have an extended 
opcode. The remaining bits of the instruction contain one or more fields for the different 
instruction formats. 


Some instruction fields are reserved or must contain a predefined value as shown in the 
individual instruction layouts. If a reserved field does not have all bits cleared, or if a field 
that must contain a particular value does not contain that value, the instruction form is 
invalid and the results are as described in Chapter 4, “Addressing Modes and Instruction Set 
Summary.” 
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8.1.1 Split-Field Notation 


Some instruction fields occupy more than one contiguous sequence of bits or occupy a 
contiguous sequence of bits used in permuted order. Such a field is called a split field. Split 
fields that represent the concatenation of the sequences from left to right are shown in 
lowercase letters. These split fields— spr and tbr—are described in Table 8-1. 


Table 8-1. Split-Field Notation and Conventions 


el ee ee, 


spr (11-20) This field is used to specify a special-purpose register for the mtspr and mfspr instructions. The 


encoding is described in Section 4.4.2.2, “Move to/from Special-Purpose Register Instructions 
(OEA).” 





tbr (11-20) This field is used to specify either the time base lower (TBL) or time base upper (TBU). 


Split fields that represent the concatenation of the sequences in some order, which need not 
be left to right (as described for each affected instruction), are shown in uppercase letters. 
These split fields —-MB, ME, and SH—are described in Table 8-2. 


8.1.2 Instruction Fields 


Table 8-2 describes the instruction fields used in the various instruction formats. 


Table 8-2. Instruction Syntax Conventions 


ce re 
AA (30) 


Absolute address bit. 

0 The immediate field represents an address relative to the current instruction address (CIA). (For 
more information on the CIA, see Table 8-3.) The effective (logical) address of the branch is 
either the sum of the LI field sign-extended to 32 bits and the address of the branch instruction 
or the sum of the BD field sign-extended to 32 bits and the address of the branch instruction. 
The immediate field represents an absolute address. The effective address (EA) of the branch is 
the LI field sign-extended to 32 bits or the BD field sign-extended to 32 bits. 


BD (16-29) Immediate field specifying a 14-bit signed two's complement branch displacement that is 
concatenated on the right with Ob00 and sign-extended to 32 bits. 
BI (11-15) This field is used to specify a bit in the CR to be used as the condition of a branch conditional 
instruction. 
BO (6-10) This field is used to specify options for the branch conditional instructions. The encoding is 
described in Section 4.2.4.2, “Conditional Branch Control.” 
crbA (11-15) This field is used to specify a bit in the CR to be used as a source. 
crbB (16-20) This field is used to specify a bit in the CR to be used as a source. 
) = 
) 
) 


crbD (6-10 his field is used to specify a bit in the CR, or in the FPSCR, as the destination of the result of an 
instruct 


ion. 
crfD (6-8 This field is used to specify one of the CR fields, or one of the FPSCR fields, as a destination. 


erfS (11-13 This field is used to specify one of the CR fields, or one of the FPSCR fields, as a source. 
CRM (12-19) This field mask is used to identify the CR fields that are to be updated by the mterf instruction. 
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Table 8-2. Instruction Syntax Conventions (Continued) 
cas a | 
d (16-31 Immediate field specifying a 16-bit signed two's complement integer that is sign-extended to 32 
bits. 
FM (7- 


) 





frS (6-10) This field is used to specify an FPR as a source 
IMM (16-19 Immediate field used as the data to be placed into a field in the FPSCR. 


LI (6-29) Immediate field specifying a 24-bit signed two's complement integer that is concatenated on the 
right with Ob00 and sign-extended to 32 bits. 


LK (31) Link bit. 
0 Does not update the link register (LR). 
1 Updates the LR. If the instruction is a branch instruction, the address of the instruction following 
the branch instruction is placed into the LR. 


MB (21-25) and | These fields are used in rotate instructions to specify a 32-bit mask as described in 
ME (26-30) Section 4.2.1.4, “Integer Rotate and Shift Instructions.” 


10) 
20) 
-31) 





rB (16-20 This field is used to specify a GPR to be used as a source. 


Re (31 Record bit. 
0 Does not update the condition register (CR). 
1 Updates the CR to reflect the result of the operation. 
For integer instructions, CR bits 0-2 are set to reflect the result as a signed quantity and CR bit 
3 receives a copy of the summary overflow bit, XER[SO]. The result as an unsigned quantity or 
a bit string can be deduced from the EQ bit. For floating-point instructions, CR bits 4~—7 are set 
to reflect floating-point exception, floating-point enabled exception, floating-point invalid 


) 
operation exception, and floating-point overflow exception. 
(Note that exceptions are referred to as interrupts in the architecture specification.) 
( 


rD (6-10 This field is used to specify a GPR to be used as a destination. 


rS (6-10) This field is used to specify a GPR to be used as a source. 
SH (16-20 This field is used to specify a shift amount. 
SIMM (16-31 This immediate field is used to specify a 16-bit signed integer. 
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Table 8-2. Instruction Syntax Conventions (Continued) 


[a (a 
SR (12-15) This field is used to specify one of the 16 segment registers. 


TO (6-10) This field is used to specify the conditions on which to trap. The encoding is described in 


Section 4.2.4.6, “Trap Instructions.” 


UIMM (16-31) This immediate field is used to specify a 16-bit unsigned integer. 


XO (21-30, Extended opcode field. 
22-30, 26-30) 


8.1.3 Notation and Conventions 


The operation of some instructions is described by a semiformal language (pseudocode). 
See Table 8-3 for a list of pseudocode notation and conventions used throughout this 
chapter. 





Table 8-3. Notation and Conventions 


ancora SSCS 
-rconieneneaiin mayne SY 
EA 


Update. When used as a character of an instruction mnemonic, a period (.) means that the 
instruction updates the condition register field. 


Carry. When used as a character of an instruction mnemonic, a ‘c’ indicates a carry out in 
XER[CA]. 


Extended Precision. 
When used as the last character of an instruction mnemonic, an ‘e’ indicates the use of 
XER[CA] as an operand in the instruction and records a carry out in XER[CA]. 


Overflow. When used as a character of an instruction mnemonic, an ‘o’ indicates the record of 
an overflow in XER[OV] and CRO[SO] for integer instructions or CR1[SO] for floating-point 
instructions. 


fron 
ES 
ise si oan oo aes (ha O10] Teese ws OT) 
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Table 8-3. Notation and Conventions (Continued) 


e= | Exclusive-OR, Equivalence logical operators (for example, (a = b) = (a ® - b)) 
A number expressed in binary format. 
A number expressed in hexadecimal format. 


( The replication of x, n times (that is, x concatenated to itself n— 1 times). 
(n)O and (n)1 are special cases. A description of the special cases follows: 
* (n)O means a field of n bits with each bit equal to 0. Thus (5)0 is equivalent to 
0b00000. 
*(n)1 means a field of n bits with each bit equal to 1. Thus (5)1 is equivalent to 
0611111. 


(rA|O The contents of rA if the rA field has the value 1-31, or the value 0 if the rA field is 0. 
The contents of rX 


6,= 

n)x 
) 

rX) 

[n] 


Absolute value of x 
CIA 


Current instruction address. 

The 32-bit address of the instruction being described by a sequence of pseudocode. Used by 
relative branches to set the next instruction address (NIA) and by branch instructions with 
LK = 1 to set the link register. Does not correspond to any architected register. 


Clear Clear the leftmost or rightmost n bits of a register to 0. This operation is used for rotate and 
shift instructions. 


Clear left and shift left | Clear the leftmost b bits of a register, then shift the register left by n bits. This operation can 
be used to scale a known non-negative array index by the width of an element. These 
operations are used for rotate and shift instructions. 


Do loop. 
+ Indenting shows range. 
*“To” and/or “by” clauses specify incrementing an iteration variable. 
+ “While” clauses give termination conditions. 


DOUBLE (x) Result of converting x from floating-point single-precision format to floating-point double- 
precision format. 


Extract Select a field of n bits starting at bit position b in the source register, right or left justify this 
field in the target register, and clear all other bits of the target register to zero. This operation 
is used for rotate and shift instructions. 


EXTS(x) Result of extending x on the left with sign bits 
GPR(x) General-purpose register x 
if...then...else... Conditional execution, indenting shows range, else is optional. 
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Table 8-3. Notation and Conventions (Continued) 


Select a field of n bits in the source register, insert this field starting at bit position b of the 
target register, and leave other bits of the target register unchanged. (No simplified 
mnemonic is provided for insertion of a field when operating on double words; such an 
insertion requires more than one instruction.) This operation is used for rotate and shift 
instructions. (Note that simplified mnemonics are referred to as extended mnemonics in the 
architecture specification.) 


Leave innermost do loop, or the do loop described in leave statement. 


MASK(x, y) Mask having ones in positions x through y (wrapping if x > y) and zeros elsewhere. 


MEM(<x, y) Contents of y bytes of memory starting at address x. 


Next instruction address, which is the32-bit address of the next instruction to be executed 
(the branch destination) after a successful branch. In pseudocode, a successful branch is 
indicated by assigning a value to NIA. For instructions which do not branch, the next 
instruction address is CIA + 4. Does not correspond to any architected register. 


PowerPC operating environment architecture 


Rotate the contents of a register right or left n bits without masking. This operation is used for 
rotate and shift instructions. 


ROTL[64](x, y) Result of rotating the 64-bit value x left y positions 
ROTL[32](x, y) Result of rotating the 64-bit value x || x left y positions, where x is 32 bits long 
Set Bits are set to 1. 


Shift Shift the contents of a register right or left n bits, clearing vacated bits (logical shift). This 
operation is used for rotate and shift instructions. 


SINGLE(x) Result of converting x from floating-point double-precision format to floating-point single- 
precision format. 


execution to another on the same implementation. 
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Table 8-4 describes instruction field notation conventions used throughout this chapter. 


Table 8-4. Instruction Field Conventions 


Specification 
crfD, crfS (respectively) 
a [ee 


FRA, FRB, FRC, FRT, FRS frA, frB, frC, frD, frS (respectively) 
FXM CRM 


rA, rB, rD, rS (respectively) 
ay (a 


Precedence rules for pseudocode operators are summarized in Table 8-5. 





Table 8-5. Precedence Rules 


x[n], function evaluation Left to right 


(n)x or replication, Right to left 
x(n) or exponentiation 


Right to let 
: Left to right 

Left to right 

Left to right 


<, >, 2, <U, >U, ? Left to right 


Left to right 
Left to right 
None 


None 
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Operators higher in Table 8-5 are applied before those lower in the table. Operators at the 
same level in the table associate from left to right, from right to left, or not at all, as shown. 
For example, “—” (unary minus) associates from left to right, soa -— b—c = (a—b)-c. 
Parentheses are used to override the evaluation order implied by Table 8-5, or to increase 
clarity; parenthesized expressions are evaluated before serving as operands. 





8.1.4 Computation Modes 
The PowerPC architecture allows for the following types of implementations: 


* 64-bit implementations, in which all registers except some special-purpose registers 
(SPRs) are 64 bits long and effective addresses are 64 bits long. All 64-bit 
implementations have two modes of operation: 64-bit mode (which is the default) 
and 32-bit mode. The mode controls how the effective address is interpreted, how 
condition bits are set, and how the count register (CTR) is tested by branch 
conditional instructions. All instructions provided for 64-bit implementations are 
available in both 64- and 32-bit modes. 


¢ 32-bit implementations, in which all registers except the FPRs are 32 bits long and 
effective addresses are 32 bits long. 


Note that the all pseudocode examples provided in this chapter are for 32-bit 
implementations.For more information on 64-bit and 32-bit modes, refer to Section 1.1.1, 
“The 64-Bit PowerPC Architecture and the 32-Bit Subset.” 
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8.2 PowerPC Instruction Set 


The remainder of this chapter lists and describes the instruction set for the PowerPC 
architecture. The instructions are listed in alphabetical order by mnemonic. Figure 8-1 
shows the format for each instruction description page. 


Instruction name 


Instruction syntax 


Equivalent POWER mnemonics 


Instruction encoding 


Pseudocode description 








addx addx 
Add 
add rD,rA,rB (OE = 0 Rc =0) 
add. rD,rA,rB (OE =0Rc= 1) 
addo rD,rA,rB (OE= 1 Rc=0) 
addo. rD,rA,rB (OE=1Rc=1) 
[POWER mnemonics: cax, cax., caxo, caxo.] 
31 D A B OE} 266 Re 
0 5 6 10 11 15 16 20 21 22 30 31 
= xrD (rA) + (xB) 





of instruction operation 
Text description of 
instruction operation 
Registers altered by instruction 


Quick reference legend 


The sum (rA) + (1B) is placed into rD. 


Other registers altered: 
¢ Condition Register (CRO field): 
Affected: LT, GT, EQ, SO(if Re = 1) 
* XER: 
Affected: SO, OV(if OE = 1) 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA xO 




















Figure 8-1. Instruction Description 





Note that the execution unit that executes the instruction may not be the same for all 


PowerPC processors. 
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8-9 


addx addx 


Add 

add rD,rA,rB (OE =0 Rc =0) 
add. rD,rA,rB (OE=0Rc=1) 
addo rD,rA,rB (OE= 1 Rc=0) 
addo. rD,rA,rB (OE=1Rc=1) 


[POWER mnemonics: cax, cax., caxo, Caxo.] 





31 D A B OE 266 Re 
0 5 6 10 11 15 16 20 21 22 30 31 


rD< (rA) + (xB) 


The sum (rA) + (rB) is placed into rD. 
The add instruction is preferred for addition because it sets few status bits. 


Other registers altered: 
* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
Note: CRO field may not reflect the infinitely precise result if overflow occurs (see 
XER below). 
¢ XER: 
Affected: SO, OV Gf OE = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XO 
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addcx addcx 


Add Carrying 


addc rD,rA,rB (OE=0 Rc =0) 
addc. rD,rA,rB (OE =0 Rc= 1) 
addco rD,rA,rB (OE = 1 Rc =0) 
addco. rD,rA,rB (OE= 1 Rc=1) 


[POWER mnemonics: a, a., ao, ao.] 





31 D A B OE 10 Re 
0 5 6 10 11 15 16 20 21 22 30 31 


rD< (rA) + (xB) 


The sum (rA) + (rB) is placed into rD. 


Other registers altered: 
* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
Note: CRO field may not reflect the infinitely precise result if overflow occurs (see 
XER below). 
¢ XER: 

Affected: CA 
Affected: SO, OV Gf OE = 1) 

PowerPC Architecture Level Supervisor Level Optional Form 

UISA XO 
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addex addex 


Add Extended 


adde rDrArB  (OE=0Rc=0) 
adde. rDrArB  (OE=0Rc=1) 
addeo rD,rA,rB_ Ss (OE=1 Rc =0) 
addeo. rD,rA,rB_ Ss (OE=1 Re=1) 


[POWER mnemonics: ae, ae., aeo, aeo.] 





31 D A B OE 138 Re 
0 5 6 10 11 15 16 20 21 22 30 31 


rD¢ (rA) + (xB) + XER[CA] 


The sum (rA) + (rB) + XER[CA] is placed into rD. 


Other registers altered: 
* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
Note: CRO field may not reflect the infinitely precise result if overflow occurs (see 
XER below). 
¢ XER: 

Affected: CA 
Affected: SO, OV Gf OE = 1) 

PowerPC Architecture Level Supervisor Level Optional Form 

UISA XO 
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addi 


Add Immediate 


addi rD,rA,SIMM 
[POWER mnemonic: cal] 


addi 


SIMM 





0 5 6 10 11 


if rA = 0 then rD < EXTS(SIMM) 
else rD¢ rA + EXTS (SIMM) 


The sum (rAl0) + SIMM is placed into rD. 


31 


The addi instruction is preferred for addition because it sets few status bits. Note that addi 
uses the value 0, not the contents of GPRO, if rA = 0. 


Other registers altered: 


e None 


Simplified mnemonics: 


li rD,value 
la rD,disp(rA) 
subi rD,rA,value 


equivalent to 
equivalent to 
equivalent to 


PowerPC Architecture Level 


addi_ rD,0,value 
addi rD,rA,disp 
addi rD,rA,—value 


Supervisor Level Optional Form 





UISA 








D 
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addic addic 


Add Immediate Carrying 


addic rD,rA,SIMM 
[POWER mnemonic: ai] 


12 D A SIMM 
0 5 6 10 11 15 16 31 





rD< (rA) + EXTS(SIMM) 


The sum (rA) + SIMM is placed into rD. 
Other registers altered: 
¢ XER: 
Affected: CA 


Simplified mnemonics: 





subic rD,rA,value equivalent to addic rD,rA,—value 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA D 
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addic. addic. 


Add Immediate Carrying and Record 


addic. rD,rA,SIMM 
[POWER mnemonic: ai.] 





13 D A SIMM 
0 5 6 10 11 15 16 31 


xrD€ (rA) + EXTS(SIMM) 
The sum (rA) + SIMM is placed into rD. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO 


Note: CRO field may not reflect the infinitely precise result if overflow occurs (see 
XER below). 


¢ XER: 
Affected: CA 


Simplified mnemonics: 





subic. rD,rA,value equivalent to addic. rD,rA,—value 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA D 
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addis addis 


Add Immediate Shifted 


addis rD,rA,SIMM 
[POWER mnemonic: cau] 





15 D A SIMM 

0 5 6 10 11 15 16 31 
if rA = 0 then rD¢ EXTS(SIMM || (16)0) 
else rD<¢ (rA) + EXTS(SIMM || (16) 0) 


The sum (rAl0) + (SIMM II 0x0000) is placed into rD. 


The addis instruction is preferred for addition because it sets few status bits. Note that 
addis uses the value 0, not the contents of GPRO, if rA = 0. 


Other registers altered: 


e None 


Simplified mnemonics: 





lis rD,value equivalent to addis rD,0,value 
subis rD,rA,value equivalent to addis rD,rA,—value 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA D 
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addmex addmex 


Add to Minus One Extended 


addme rD,rA (OE =0 Rc =0) 
addme. rD,rA (OE=0 Rc= 1) 
addmeo rD,rA (OE = 1 Rc =0) 
addmeo. rD,rA (OE=1Rc=1) 


[POWER mnemonics: ame, ame., ameo, ameo.] 








[_] Reserved 
31 D A 00000 (|OE 234 Re 
0 5 6 10 11 15 16 20 21 22 30 31 
xD¢ (rA) + XER[CA] - 1 
The sum (rA) + XER[CA] + OxFFFF_FFFF is placed into rD. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO (if Re = 1) 
Note: CRO field may not reflect the infinitely precise result if overflow occurs (see 
XER below). 
e XER: 
Affected: CA 
Affected: SO, OV (if OE = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XO 
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addzex addzex 


Add to Zero Extended 


addze rD,rA (OE =0 Rc =0) 
addze. rD,rA (OE=0 Rc= 1) 
addzeo rD,rA (OE = 1 Rc=0) 
addzeo. rD,rA (OE=1Rc=1) 


[POWER mnemonics: aze, aze., azeo, azeo.| 











[_] Reserved 
31 D A 00000 (|OE 202 Re 
0 5 6 10 11 15 16 20 21 22 30 31 
rD<¢ (rA) + XER[CA] 
The sum (rA) + XER[CA] is placed into rD. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO (if Re = 1) 
Note: CRO field may not reflect the infinitely precise result if overflow occurs (see 
XER below). 
e XER: 
Affected: CA 
Affected: SO, OV (if OE = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XO 
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andx andx 


AND 

and rA,rS,rB (Rc = 0) 

and. rA,rs,rB (Re = 1) 

a a Aa a 
0 5 6 10 11 15 16 20 21 30 31 


rA< (rS) & (XB) 
The contents of rS are ANDed with the contents of rB and the result is placed into rA. 


Other registers altered: 
¢ Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 
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andcx andcx 


AND with Complement 





andc rA,rs,rB (Rc = 0) 
andc. rA,rs,rB (Re = 1) 
31 s A B 60 Re 
0 5 6 10 11 15 16 20 21 30 31 


rA< (rS) + 7 (xB) 
The contents of rS are ANDed with the one’s complement of the contents of rB and the 
result is placed into rA. 
Other registers altered: 


* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 
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andi. andi. 


AND Immediate 


andi. rA,rS,;UIMM 
[POWER mnemonic: andil.] 


0 5 6 10 11 15 16 31 


rA& (rS) & ((16)0 || UIMM) 
The contents of rS are ANDed with 0x0000 || UIMM and the result is placed into rA. 
Other registers altered: 
¢ Condition Register (CRO field): 
Affected: LT, GT, EQ, SO 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA D 
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andis. andis. 


AND Immediate Shifted 


andis. rA,rS,UIMM 
[POWER mnemonic: andiu.] 


0 5 6 10 11 15 16 31 
rA& (xS) + ( UIMM || (16)0) 
The contents of rS are ANDed with UIMM II 0x0000 and the result is placed into rA. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA D 
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bx bx 





Branch 
b target_addr (AA=OLK=0) 
ba target_addr (AA=1LK=0) 
bl target_addr (AA=O0OLK=1) 
bla target_addr (AA=1LK=1) 
18 LI AA|LK 
0 5 6 29 30 31 


if AA then NIA¢iea EXTS(LI || O0b00) 
else NIA¢iea CIA + EXTS(LI || O0b00) 
if LK then LR¢iea CIA + 4 


target_addr specifies the branch target address. 


If AA = 0, then the branch target address is the sum of LI II Ob00 sign-extended and the 
address of this instruction. 


If AA = 1, then the branch target address is the value LI II Ob00 sign-extended. 


If LK = 1, then the effective address of the instruction following the branch instruction is 
placed into the link register. 


Other registers altered: 





Affected: Link Register (LR) dif LK = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA | 
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bcx bcx 


Branch Conditional 





be BO,Bl,target_addr (AA =OLK=0O) 
bea BO,BlI,target_addr (AA=1LK=0) 
bel BO,Bl,target_addr (AA=OLK=1) 
bela BO,Bl,target_addr (AA=1LK=1) 
16 BO Bl BD AA\LK 
0 5 6 10 11 15 16 29 30 31 
if 7 BO[2] then CTR ¢ CTR - 1 
ctr_ok © BO[2] | ((CTR # 0) ® BO[3]) 
cond_ok < BO[0] | (CR[BI] = BO[1]) 


if ctr_ok & cond_ok then 

if AA then NIA <iea EXTS(BD || Ob00) 
else NIA <iea CIA + EXTS(BD || ObOO) 
if LK then LR <iea CIA + 4 


The BI field specifies the bit in the condition register (CR) to be used as the condition of 
the branch. The BO field is encoded as described in Table 8-6. Additional information about 
BO field encoding is provided in Section 4.2.4.2, “Conditional Branch Control.” 


Table 8-6. BO Operand Encodings 


2 ne 





In this table, z indicates a bit that is ignored. 
Note that the z bits should be cleared, as they may be assigned a meaning in some future version of the 
PowerPC architecture. 


The y bit provides a hint about whether a conditional branch is likely to be taken, and may be used by some 
PowerPC implementations to improve performance. 
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target_addr specifies the branch target address. 


If AA = 0, the branch target address is the sum of BD || 0b00 sign-extended and the address 
of this instruction. 


If AA = 1, the branch target address is the value BD II 0b00 sign-extended. 


If LK = 1, the effective address of the instruction following the branch instruction is placed 
into the link register. 
Other registers altered: 

Affected: Count Register (CTR) (if BO[2] = 0) 

Affected: Link Register (LR) (if LK = 1) 


Simplified mnemonics: 





bit target equivalent to be 12,0,target 
bne —er2, target equivalent to be = 4,10, target 
bdnz target equivalent to be 16,0,target 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA B 
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bcctrx bcctrx 


Branch Conditional to Count Register 


beetr BO,BI (LK =0) 
beetrl BO,BI (LK=1) 


[POWER mnemonics: bec, becl] 


[_] Reserved 
0 5 6 10 11 15 16 20 21 30 31 


cond_ok < BO[0] | (CR[BI] = BO[1]) 
if cond_ok then 

NIA ¢iea CTR || ObOO 

if LK then LR ¢iea CIA + 4 


The BI field specifies the bit in the condition register to be used as the condition of the 
branch. The BO field is encoded as described in Table 8-7. Additional information about 
BO field encoding is provided in Section 4.2.4.2, “Conditional Branch Control.” 


Table 8-7. BO Operand Encodings 


2 





In this table, z indicates a bit that is ignored. 
Note that the z bits should be cleared, as they may be assigned a meaning in some future version of the 
PowerPC architecture. 


The y bit provides a hint about whether a conditional branch is likely to be taken, and may be used by some 
PowerPC implementations to improve performance. 


The branch target address is CTR II 0b00. 


If LK = 1, the effective address of the instruction following the branch instruction is placed 
into the link register. 
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If the “decrement and test CTR” option is specified (BO[2] = 0), the instruction form is 
invalid. 


Other registers altered: 
Affected: Link Register (LR) (if LK = 1) 


Simplified mnemonics: 





blitctr equivalent to bectr 12,0 
bnectr cr2 equivalent to bectr 4,10 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XL 
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bclirx bclirx 


Branch Conditional to Link Register 


belr BO,BI (LK = 0) 
belrl BO,BI (LK=1) 


[POWER mnemonics: ber, ber] 


[_] Reserved 





19 BO BI 00000 16 LK 
0 5 6 10 11 15 16 20 21 30 31 

if 7 BO[2] then CTR ¢ CIR - 1 

ctr_ok © BO[2] | ((CTR # 0) ® BO[3]) 

cond_ok ¢ BO[0] | (CR[BI] = BO[1]) 

if ctr_ok & cond_ok then 


NIA <iea LR || ObOO 
if LK then LR ¢iea CIA + 4 


The BI field specifies the bit in the condition register to be used as the condition of the 
branch. The BO field is encoded as described in Table 8-8. Additional information about 
BO field encoding is provided in Section 4.2.4.2, “Conditional Branch Control.” 


Table 8-8. BO Operand Encodings 


i ee ee 





In this table, z indicates a bit that is ignored. 
Note that the z bits should be cleared, as they may be assigned a meaning in some future version of 
the PowerPC architecture. 


The y bit provides a hint about whether a conditional branch is likely to be taken, and may be used by 
some PowerPC implementations to improve performance. 


The branch target address is LR II Ob00. 
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If LK = 1, then the effective address of the instruction following the branch instruction is 
placed into the link register. 
Other registers altered: 

Affected: Count Register (CTR) (if BO[2] = 0) 

Affected: Link Register (LR) (if LK = 1) 


Simplified mnemonics: 





bltir equivalent to belr 12,0 
bnelr cr2 equivalent to belr 4,10 
bdnzlr equivalent to belr 16,0 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XL 
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cmp cmp 








Compare 
cmp crfD,L,rA,rB 
[_] Reserved 
31 erfD |O}L A B 0000000000 0 
0 5 6 8 9 10 11 15 16 20 21 30 31 


if L = 0 then a ¢ EXTS (rA) 
b © EXTS (rB) 
else a < (rA) 
b © (zB) 
if a< b then c € 0b100 
else if a > b then c ¢ 0b010 
else c © 0b001 
CR[4 * erfD-4 * crfD + 3] < c || XER[SO] 


The contents of rA are compared with the contents of rB, treating the operands as signed 
integers. The result of the comparison is placed into CR field erfD. 





Other registers altered: 
¢ Condition Register (CR field specified by operand erfD): 
Affected: LT, GT, EQ, SO 


Simplified mnemonics: 





cmpd rA,rB equivalent to cmp 0,1,rA,rB 
cmpw cr3,rA,rB equivalent to cmp 3,0,rA,rB 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA Xx 
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cmpi cmpi 


Compare Immediate 





cmpi crfD,L,rA,SIMM 
[_] Reserved 
11 erfD O}L A SIMM 
0 5 6 8 9 10 11 15 16 31 
a ¢ (rA) 


Le a < EXTS(SIMM) then c ¢ 0b100 

else if a > EXTS(SIMM) then c ¢€ 0b010 
else o.<— 0b001 

CR[4 * erfD-4 * erfD + 3] < c || XER[SO] 


The contents of rA are compared with the sign-extended value of the SIMM field, treating 
the operands as signed integers. The result of the comparison is placed into CR field erfD. 


In 32-bit implementations, if L = | the instruction form is invalid. 


Other registers altered: 
* Condition Register (CR field specified by operand erfD): 
Affected: LT, GT, EQ, SO 


Simplified mnemonics: 





cmpdirA,value equivalent to cmpi 0,1,rA,value 
cmpwi cr3,rA,value equivalent to cmpi 3,0,rA,value 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA D 
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cmpl cmpl 


Compare Logical 








cmpl crfD,L,rA,rB 
[_] Reserved 
31 erfD O}L A B 32 0 
0 5 6 8 9 10 11 15 16 20 21 31 
a ¢ (rA) 
b © (ZB) 


if a <U b then c ¢ O0b100 

else if a >U b then c ¢ 0b010 

else c © 0b001 

CR[4 * erfD-4 * crfD + 3] < c || XER[SO] 


The contents of rA are compared with the contents of rB, treating the operands as unsigned 
integers. The result of the comparison is placed into CR field erfD. 


In 32-bit implementations, if L = 1 the instruction form is invalid. 


Other registers altered: 
¢ Condition Register (CR field specified by operand erfD): 
Affected: LT, GT, EQ, SO 


Simplified mnemonics: 





cmpldrA,rB equivalent to cmp! 0,1,rA,rB 
cmplw cr3,rA,rB equivalent to cmp! 3,0,rA,rB 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA xX 




















8-32 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


cmpli cmpli 


Compare Logical Immediate 








cmpli crfD,L,rA,UIMM 
[_] Reserved 
10 erfD O}L A UIMM 
0 5 6 8 9 10 11 15 16 31 
a ¢ (rA) 


aft a <U ((16)0 || UIMM) then c ¢ 0b100 
else if a >U ((16)0 || UIMM) then c ¢ 0b010 
else c © 0b001 

CR[4 * erfD-4 * crfD + 3] < c || XER[SO] 


The contents of rA are compared with 0x0000 Il UIMM, treating the operands as unsigned 
integers. The result of the comparison is placed into CR field erfD. 


In 32-bit implementations, if L = 1 the instruction form is invalid. 


Other registers altered: 
* Condition Register (CR field specified by operand erfD): 
Affected: LT, GT, EQ, SO 


Simplified mnemonics: 





cmpldir A,value equivalent to cmpli 0,1,rA,value 
cmplwi cr3,rA,value equivalent to cmpli 3,0,rA,value 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA D 
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cntlzwx cntlzwx 


Count Leading Zeros Word 


cntlzw rA,rs (Rc = 0) 
entlzw. rA,rS (Rc = 1) 


[POWER mnemonics: entlz, cntlz.] 


[_] Reserved 
0 5 6 10 11 15 16 20 21 30 31 
ite ae 
do while n < 32 
if rS[n] = 1 then leave 
neni 
rA <n 


A count of the number of consecutive zero bits starting at bit 0 of rS is placed into rA. This 
number ranges from 0 to 32, inclusive. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO Gf Re = 1) 
Note: If Rc = 1, then LT is cleared in the CRO field. 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















8-34 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


crand crand 


Condition Register AND 


crand crbD,crbA,crbB 

[_] Reserved 
ai ro] 
0 5 6 10 11 15 16 20 21 30 31 


CR[erbD] < CR[erbA] & CR[crbB] 
The bit in the condition register specified by erbA is ANDed with the bit in the condition 
register specified by crbB. The result is placed into the condition register bit specified by 
erbD. 
Other registers altered: 
* Condition Register: 


Affected: Bit specified by operand crbD 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA XL 
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crandc crandc 


Condition Register AND with Complement 





crandc crbD,crbA,crbB 
[_] Reserved 
19 crbD crbA crbB 129 0 
0 5 6 10 11 15 16 20 21 30 31 


CR[erbD] < CR[erbA] & 7 CR[crbB] 
The bit in the condition register specified by erbA is ANDed with the complement of the 
bit in the condition register specified by erbB and the result is placed into the condition 
register bit specified by erbD. 
Other registers altered: 
* Condition Register: 


Affected: Bit specified by operand crbD 


PowerPC Architecture Level Supervisor Level Optional Form 








UISA XL 

















8-36 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


creqv creqv 


Condition Register Equivalent 


creqv crbD,crbA,crbB 

[_] Reserved 
= ro] 
0 5 6 10 11 15 16 20 21 30 31 


CR[erbD] < CR[erbA] = CR[crbB] 
The bit in the condition register specified by erbA is XORed with the bit in the condition 
register specified by erbB and the complemented result is placed into the condition register 
bit specified by erbD. 
Other registers altered: 
¢ Condition Register: 
Affected: Bit specified by operand crbD 


Simplified mnemonics: 





crset crbD equivalent to creqv crbD,crbD,crbD 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XL 
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crnand crnand 


Condition Register NAND 


crnand crbD,crbA,crbB 

[_] Reserved 
2 re] 
0 5 6 10 11 15 16 20 21 30 31 


CR[erbD] <- ~ (CR[erbA] & CR[crbB]) 
The bit in the condition register specified by erbA is ANDed with the bit in the condition 
register specified by erbB and the complemented result is placed into the condition register 
bit specified by erbD. 
Other registers altered: 
¢ Condition Register: 


Affected: Bit specified by operand crbD 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA XL 























8-38 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


crnor crnor 


Condition Register NOR 





crnor crbD,crbA,crbB 
[_] Reserved 
19 crbD crbA crbB 33 0 
0 5 6 10 11 15 16 20 21 30 31 


CR[erbD] <- ~— (CR[erbA] | CR[crbB]) 
The bit in the condition register specified by crbA is ORed with the bit in the condition 
register specified by erbB and the complemented result is placed into the condition register 
bit specified by erbD. 
Other registers altered: 
¢ Condition Register: 
Affected: Bit specified by operand crbD 


Simplified mnemonics: 





crnot crbD,crbA equivalent to crnor crbD,crbA,crbA 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XL 




















Chapter 8. Instruction Set 8-39 


cror cror 


Condition Register OR 


cror crbD,crbA,crbB 

[_] Reserved 
r ro] 
0 5 6 10 11 15 16 20 21 30 31 


CR[erbD] <- CR[crbA] | CR[crbB] 
The bit in the condition register specified by erbA is ORed with the bit in the condition 
register specified by crbB. The result is placed into the condition register bit specified by 
erbD. 
Other registers altered: 
* Condition Register: 
Affected: Bit specified by operand crbD 


Simplified mnemonics: 





crmove crbD,crbA equivalent to cror crbD,crbA,crbA 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XL 




















8-40 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


crore crore 


Condition Register OR with Complement 








crore crbD,crbA,crbB 
[_] Reserved 
19 crbD crbA crbB 417 0 
0 5 6 10 11 15 16 20 21 30 31 


CR[erbD] <- CR[crbA] | 7 CR[crbB] 
The bit in the condition register specified by erbA is ORed with the complement of the 
condition register bit specified by crbB and the result is placed into the condition register 
bit specified by erbD. 
Other registers altered: 
¢ Condition Register: 


Affected: Bit specified by operand crbD 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA XL 
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Crxor Crxor 


Condition Register XOR 





crxor crbD,crbA,crbB 
[_] Reserved 
19 crbD crbA crbB 193 0 
0 5 6 10 11 15 16 20 21 30 31 


CR[erbD] < CR[erbA] © CR[erbB] 
The bit in the condition register specified by erbA is XORed with the bit in the condition 
register specified by erbB and the result is placed into the condition register specified by 
erbD. 
Other registers altered: 
* Condition Register: 
Affected: Bit specified by erbD 


Simplified mnemonics: 





erelr crbD equivalent to erxor crbD,crbD,crbD 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XL 




















8-42 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


dcba dcba 


Data Cache Block Allocate 








dcba rA,rB 
[_] Reserved 
31 00000 A B 758 0 
0 5 6 10 11 15 16 20 21 30 31 
EA is the sum (rAl0) + (rB). 


The dcba instruction allocates the block in the data cache addressed by EA, by marking it 
valid without reading the contents of the block from memory; the data in the cache block 
is considered to be undefined after this instruction completes. This instruction is a hint that 
the program will probably soon store into a portion of the block, but the contents of the rest 
of the block are not meaningful to the program (eliminating the need to read the entire block 
from main memory), and can provide for improved performance in these code sequences. 


The deba instruction executes as follows: 


¢ If the cache block containing the byte addressed by EA is in the data cache, the 
contents of all bytes are made undefined but the cache block is still considered valid. 
Note that programming errors can occur if the data in this cache block is 
subsequently read or used inadvertently. 


¢ Ifthe cache block containing the byte addressed by EA is not in the data cache and 
the corresponding memory page or block is caching-allowed, the cache block is 
allocated (and made valid) in the data cache without fetching the block from main 
memory, and the value of all bytes is undefined. 


e Ifthe addressed byte corresponds to a caching-inhibited page or block (i.e. if the I 
bit is set), this instruction is treated as a no-op. 


e If the cache block containing the byte addressed by EA is in coherency-required 
mode, and the cache block exists in the data cache(s) of any other processor(s), it is 
kept coherent in those caches (1.e. the processor performs the appropriate bus 
transactions to enforce this). 


This instruction is treated as a store to the addressed byte with respect to address translation, 
memory protection, referenced and changed recording and the ordering enforced by eieio 
or by the combination of caching-inhibited and guarded attributes for a page (or block). 
However, the DSI exception is not invoked for a translation or protection violation, and the 
referenced and changed bits need not be updated when the page or block is cache-inhibited 
(causing the instruction to be treated as a no-op). 


This instruction is optional in the PowerPC architecture. 
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Other registers altered: 
e« None 


In the PowerPC OEA, the deba instruction is additionally defined to clear all bytes of a 
newly established block to zero in the case that the block did not already exist in the cache. 


Additionally, as the dcba instruction may establish a block in the data cache without 
verifying that the associated physical address is valid, a delayed machine check exception 
is possible. See Chapter 6, “Exceptions,” for a discussion about this type of machine check 
exception. 


PowerPC Architecture Level Supervisor Level Optional Form 





VEA V X 




















8-44 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


dcbf dcbf 


Data Cache Block Flush 








dcbf rA,rB 
[_] Reserved 
31 00 000 A B 86 0 
0 5 6 10 11 15 16 20 21 30 31 
EA is the sum (rAl0) + (rB). 


The debf instruction invalidates the block in the data cache addressed by EA, copying the 
block to memory first, if there is any dirty data in it. If the processor is a multiprocessor 
implementation (for example, the 601, 604,and 604e) and the block is marked coherency- 
required, the processor will, if necessary, send an address-only broadcast to other 
processors. The broadcast of the debf instruction causes another processor to copy the 
block to memory, if it has dirty data, and then invalidate the block from the cache. 


The action taken depends on the memory mode associated with the block containing the 
byte addressed by EA and on the state of that block. The list below describes the action 
taken for the various states of the memory coherency attribute (M bit). 


¢ Coherency required 


— Unmodified block—Invalidates copies of the block in the data caches of all 
processors. 


— Modified block—Copies the block to memory. Invalidates copies of the block in 
the data caches of all processors. 


— Absent block—If modified copies of the block are in the data caches of other 
processors, causes them to be copied to memory and invalidated in those data 
caches. If unmodified copies are in the data caches of other processors, causes 
those copies to be invalidated in those data caches. 


¢ Coherency not required 
— Unmodified block—Invalidates the block in the processor’s data cache. 


— Modified block—Copies the block to memory. Invalidates the block in the 
processor’s data cache. 


— Absent block (target block not in cache)—No action is taken. 


The function of this instruction is independent of the write-through, write-back and 
caching-inhibited/allowed modes of the block containing the byte addressed by EA. 
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This instruction is treated as a load from the addressed byte with respect to address 
translation and memory protection. It is also treated as a load for referenced and changed 
bit recording except that referenced and changed bit recording may not occur. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





VEA X 
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dcbi debi 


Data Cache Block Invalidate 








debi rA,rB 
[_] Reserved 
31 00 000 A B 470 0 
0 5 6 10 11 15 16 20 21 30 31 
EA is the sum (rAl0) + (rB). 


The action taken is dependent on the memory mode associated with the block containing 
the byte addressed by EA and on the state of that block. The list below describes the action 
taken if the block containing the byte addressed by EA is or is not in the cache. 


* Coherency required 
— Unmodified block—lInvalidates copies of the block in the data caches of all 
processors. 


— Modified block—Invalidates copies of the block in the data caches of all 
processors. (Discards the modified contents.) 


— Absent block—If copies of the block are in the data caches of any other 
processor, causes the copies to be invalidated in those data caches. (Discards any 
modified contents.) 


¢ Coherency not required 


— Unmodified block—Invalidates the block in the processor’s data cache. 


— Modified block—lInvalidates the block in the processor’s data cache. (Discards 
the modified contents.) 


— Absent block (target block not in cache)—No action is taken. 


When data address translation is enabled, MSR[DR] = 1, and the virtual address has no 
translation, a DSI exception occurs. 


The function of this instruction is independent of the write-through and caching- 
inhibited/allowed modes of the block containing the byte addressed by EA. This instruction 
operates as a store to the addressed byte with respect to address translation and protection. 
The referenced and changed bits are modified appropriately. 


This is a supervisor-level instruction. 


Other registers altered: 
e None 


PowerPC Architecture Level Supervisor Level Optional Form 


OEA V X 
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dcbst dcbst 


Data Cache Block Store 








dcbst rA,rB 
[_] Reserved 
31 00000 A B 54 0 
0 5 6 10 11 15 16 20 21 30 31 
EA is the sum (rAl0) + (rB). 


The debst instruction executes as follows: 


¢ Ifthe block containing the byte addressed by EA is in coherency-required mode, and 
a block containing the byte addressed by EA is in the data cache of any processor 
and has been modified, the writing of it to main memory is initiated. 


¢ Ifthe block containing the byte addressed by EA is in coherency-not-required mode, 
and a block containing the byte addressed by EA is in the data cache of this 
processor and has been modified, the writing of it to main memory is initiated. 


The function of this instruction is independent of the write-through and caching- 
inhibited/allowed modes of the block containing the byte addressed by EA. 


The processor treats this instruction as a load from the addressed byte with respect to 
address translation and memory protection. It is also treated as a load for referenced and 
changed bit recording except that referenced and changed bit recording may not occur. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





VEA X 




















8-48 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


dcbt debt 


Data Cache Block Touch 








debt rA,rB 
[_] Reserved 
31 00000 A B 278 0 
0 5 6 10 11 15 16 20 21 30 31 
EA is the sum (rAl0) + (rB). 


This instruction is a hint that performance will possibly be improved if the block containing 
the byte addressed by EA is fetched into the data cache, because the program will probably 
soon load from the addressed byte. If the block is caching-inhibited, the hint is ignored and 
the instruction is treated as a no-op. Executing debt does not cause the system alignment 
error handler to be invoked. 


This instruction is treated as a load from the addressed byte with respect to address 
translation, memory protection, and reference and change recording except that referenced 
and changed bit recording may not occur. Additionally, no exception occurs in the case of 
a translation fault or protection violation. 


The program uses the debt instruction to request a cache block fetch before it is actually 
needed by the program. The program can later execute load instructions to put data into 
registers. However, the processor is not obliged to load the addressed block into the data 
cache. Note that this instruction is defined architecturally to perform the same functions as 
the debtst instruction. Both are defined in order to allow implementations to differentiate 
the bus actions when fetching into the cache for the case of a load and for a store. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





VEA X 
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dcebtst dcebtst 


Data Cache Block Touch for Store 


dcbtst rA,rB 

[_] Reserved 
[poo] a ] . ee) 
0 5 6 10 11 15 16 20 21 30 31 
EA is the sum (rAl0) + (rB). 


This instruction is a hint that performance will possibly be improved if the block containing 
the byte addressed by EA is fetched into the data cache, because the program will probably 
soon store from the addressed byte. If the block is caching-inhibited, the hint is ignored and 
the instruction is treated as a no-op. Executing debtst does not cause the system alignment 
error handler to be invoked. 


This instruction is treated as a load from the addressed byte with respect to address 
translation, memory protection, and reference and change recording except that referenced 
and changed bit recording may not occur. Additionally, no exception occurs in the case of 
a translation fault or protection violation. 


The program uses debtst to request a cache block fetch to potentially improve performance 
for a subsequent store to that EA, as that store would then be to a cached location. However, 
the processor is not obliged to load the addressed block into the data cache. Note that this 
instruction is defined architecturally to perform the same functions as the debt instruction. 
Both are defined in order to allow implementations to differentiate the bus actions when 
fetching into the cache for the case of a load and for a store. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





VEA X 




















8-50 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


dcbz dcbz 


Data Cache Block Clear to Zero 


dcbz rA,rB 
[POWER mnemonic: delz] 


[_] Reserved 


Te om [al 
5 6 


0 10 11 15 16 20 21 30 31 
EA is the sum (rAl0) + (rB). 


The debz instruction executes as follows: 


e Ifthe cache block containing the byte addressed by EA is in the data cache, all bytes 
are cleared. 


¢ Ifthe cache block containing the byte addressed by EA is not in the data cache and 
the corresponding memory page or block is caching-allowed, the cache block is 
allocated (and made valid) in the data cache without fetching the block from main 
memory, and all bytes are cleared. 

e Ifthe page containing the byte addressed by EA is in caching-inhibited or write- 
through mode, either all bytes of main memory that correspond to the addressed 
cache block are cleared or the alignment exception handler is invoked. The 
exception handler can then clear all bytes in main memory that correspond to the 
addressed cache block. 

¢ If the cache block containing the byte addressed by EA is in coherency-required 
mode, and the cache block exists in the data cache(s) of any other processor(s), it is 
kept coherent in those caches (1.e. the processor performs the appropriate bus 
transactions to enforce this). 


This instruction is treated as a store to the addressed byte with respect to address translation, 
memory protection, referenced and changed recording. It is also treated as a store with 
respect to the ordering enforced by eieio and the ordering enforced by the combination of 
caching-inhibited and guarded attributes for a page (or block). 

Other registers altered: 


e None 
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The PowerPC OEA describes how the debz instruction may establish a block in the data 
cache without verifying that the associated physical address is valid. This scenario can 
cause a delayed machine check exception; see Chapter 6, “Exceptions,” for a discussion 
about this type of machine check exception. 


PowerPC Architecture Level Supervisor Level Optional Form 





VEA X 




















8-52 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


divwx divwx 


Divide Word 

divw rD,rA,rB (OE =0 Rc =0) 

divw. rD,rA,rB (OE=0Rc=1) 

divwo rD,rA,rB (OE=1Rc=0) 

divwo. rD,rA,rB (OE=1 Rc=1) 
aes 
0 5 6 10 11 15 16 20 21 22 30 31 


dividend (rA) 
divisor ¢ (xB) 
rD ¢ dividend + divisor 


The dividend is the contents of rA. The divisor is the contents of rB. The 32-bit quotient is 
formed and placed in rD. The remainder is not supplied as a result. 


Both the operands and the quotient are interpreted as signed integers. The quotient is the 
unique signed integer that satisfies the equation—dividend = (quotient * divisor) + r where 
0 <r < ldivisorl (if the dividend is non-negative), and —Idivisorl < r < 0 (if the dividend is 
negative). 


If an attempt is made to perform either of the divisions—0x8000_0000 + -1 or 
<anything> + 0, then the contents of rD are undefined, as are the contents of the LT, GT, 
and EQ bits of the CRO field (if Re = 1). In this case, if OE = 1 then OV is set. 


The 32-bit signed remainder of dividing the contents of rA by the contents of rB can be 
computed as follows, except in the case that the contents of rA = —23! and the contents of 
rB =-1. 


divw rD,rA,rB # rD = quotient 
mullw rD,rD,rB # rD = quotient * divisor 
subf rD,rD,rA # rD = remainder 
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Other registers altered: 
* Condition Register (CRO field): 


Affected: LT, GT, EQ, SO (if Re = 1) 
e XER: 
Affected: SO, OV (if OE = 1) 


Note: The setting of the affected bits in the XER is mode-independent, and reflects 
overflow of the 32-bit result. 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA XO 























8-54 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


divwux divwux 


Divide Word Unsigned 





divwu rD,rA,rB (OE=0 Rc =0) 
divwu. rD,rA,rB (OE=0 Rc =1) 
divwuo rD,rA,rB (OE= 1 Rc =0) 
divwuo. rD,rA,rB (OE=1Rc=1) 
31 D A B OE 459 Re 
0 5 6 10 11 15 16 20 21 22 30 31 


dividend ¢ (rxA) 
divisor © (xB) 
rD ¢ dividend + divisor 


The dividend is the contents of rA. The divisor is the contents of rB. A 32-bit quotient is 
formed. The 32-bit quotient is placed into rD. The remainder is not supplied as a result. 


Both operands and the quotient are interpreted as unsigned integers, except that if Rc = 1 
the first three bits of CRO field are set by signed comparison of the result to zero. The 
quotient is the unique unsigned integer that satisfies the equation—dividend = (quotient * 
divisor) + r (where 0 < r < divisor). If an attempt is made to perform the 
division—<anything> + 0—then the contents of rD are undefined as are the contents of the 
LT, GT, and EQ bits of the CRO field (if Rc = 1). In this case, if OE = 1 then OV is set. 


The 32-bit unsigned remainder of dividing the contents of rA by the contents of rB can be 
computed as follows: 


divwu rD,rA,rB # rD = quotient 
mullw rD,rD,rB # rD = quotient * divisor 
subf rD,rD,rA # rD = remainder 
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Other registers altered: 
* Condition Register (CRO field): 


Affected: LT, GT, EQ, SO (if Re = 1) 
e XER: 
Affected: SO, OV (if OE = 1) 


Note: The setting of the affected bits in the XER is mode-independent, and reflects 
overflow of the 32-bit result. 
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UISA XO 























8-56 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


eciwx eciwx 


External Control In Word Indexed 








eciwx rD,rA,rB 
[_] Reserved 
31 D A B 310 0 
0 5 6 10 11 15 16 20 21 30 31 


The eciwx instruction and the EAR register can be very efficient when mapping special 
devices such as graphics devices that use addresses as pointers. 

if rA = 0 then b ¢€ 0 

else be (rA) 

EA <b + (xB) 

paddr ¢ address translation of EA 


send load word request for paddr to device identified by EAR[RID] 
rD ¢ word from device 


EA is the sum (rAl0) + (rB). 


A load word request for the physical address (referred to as real address in the architecture 
specification) corresponding to EA is sent to the device identified by EAR[RID], bypassing 
the cache. The word returned by the device is placed in rD. 


EAR[E] must be 1. If it is not, a DSI exception is generated. 


EA must be a multiple of four. If it is not, one of the following occurs: 


e A system alignment exception is generated. 
¢ A DSI exception is generated (possible only if EAR[E] = 0). 
¢ The results are boundedly undefined. 


The eciwx instruction is supported for EAs that reference memory segments in which 
SR[T] = 1 and for EAs mapped by the DBAT registers. If the EA references a direct-store 
segment (SR[T] = 1), either a DSI exception occurs or the results are boundedly undefined. 
However, note that the direct-store facility is being phased out of the architecture and will 
not likely be supported in future devices. Thus, software should not depend on its effects. 


If this instruction is executed when MSR[DR] = 0 (real addressing mode), the results are 
boundedly undefined. 


This instruction is treated as a load from the addressed byte with respect to address 
translation, memory protection, referenced and changed bit recording, and the ordering 
performed by eieio. 


This instruction is optional in the PowerPC architecture. 
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Other registers altered: 


e None 
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VEA V X 




















8-58 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


ecowx ecowx 


External Control Out Word Indexed 








ecowx rs,rA,rB 
[_] Reserved 
31 Ss A B 438 0 
0 5 6 10 11 15 16 20 21 30 31 


The ecowx instruction and the EAR register can be very efficient when mapping special 
devices such as graphics devices that use addresses as pointers. 


if rA = 0 then b ¢ 0 

else b € (rA) 

EA ¢ b+ (xB) 

paddr ¢ address translation of EA 

send store word request for paddr to device identified by EAR[RID] 
send rS to device 


EA is the sum (rAl0) + (rB). 


A store word request for the physical address corresponding to EA and the contents of rS 
are sent to the device identified by EAR[RID], bypassing the cache. 


EAR[E] must be 1, if it is not, a DSI exception is generated. EA must be a multiple of four. 
If it is not, one of the following occurs: 


e A system alignment exception is generated. 
¢ A DSlexception is generated (possible only if EAR[E] = 0). 
¢ The results are boundedly undefined. 


The ecowx instruction is supported for effective addresses that reference memory segments 
in which SR[T] = 0), and for EAs mapped by the DBAT registers. If the EA references a 
direct-store segment (SR[T] = 1), either a DSI exception occurs or the results are boundedly 
undefined. However, note that the direct-store facility is being phased out of the architecture 
and will not likely be supported in future devices. Thus, software should not depend on its 
effects. 


If this instruction is executed when MSR[DR] = 0 (real addressing mode), the results are 
boundedly undefined. 


This instruction is treated as a store from the addressed byte with respect to address 
translation, memory protection, and referenced and changed bit recording, and the ordering 
performed by eieio. Note that software synchronization is required in order to ensure that 
the data access is performed in program order with respect to data accesses caused by other 
store or ecowx instructions, even though the addressed byte is assumed to be caching- 
inhibited and guarded. 
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This instruction is optional in the PowerPC architecture. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





VEA V X 




















8-60 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


eleio eleio 


Enforce In-Order Execution of I/O 








[_] Reserved 
0 5 6 10 11 15 16 20 21 30 31 


The eieio instruction provides an ordering function for the effects of load and store 
instructions executed by a processor. These loads and stores are divided into two sets, which 
are ordered separately. The memory accesses caused by a debz or a deba instruction are 
ordered like a store. The two sets follow: 


1. Loads and stores to memory that is both caching-inhibited and guarded, and stores 
to memory that is write-through required. 


The eieio instruction controls the order in which the accesses are performed in main 
memory. It ensures that all applicable memory accesses caused by instructions 
preceding the eieio instruction have completed with respect to main memory before 
any applicable memory accesses caused by instructions following the eieio 
instruction access main memory. It acts like a barrier that flows through the memory 
queues and to main memory, preventing the reordering of memory accesses across 
the barrier. No ordering is performed for debz if the instruction causes the system 
alignment error handler to be invoked. 


All accesses in this set are ordered as a single set—that is, there is not one order for 
loads and stores to caching-inhibited and guarded memory and another order for 
stores to write-through required memory. 


2. Stores to memory that have all of the following attributes—caching-allowed, write- 
through not required, and memory-coherency required. 


The eieio instruction controls the order in which the accesses are performed with 
respect to coherent memory. It ensures that all applicable stores caused by 
instructions preceding the eieio instruction have completed with respect to coherent 
memory before any applicable stores caused by instructions following the eieio 
instruction complete with respect to coherent memory. 


With the exception of dcbz and dcba, eieio does not affect the order of cache operations 
(whether caused explicitly by execution of a cache management instruction, or implicitly 
by the cache coherency mechanism). For more information, refer to Chapter 5, “Cache 
Model and Memory Coherency.” The eieio instruction does not affect the order of accesses 
in one set with respect to accesses in the other set. 


The eieio instruction may complete before memory accesses caused by instructions 
preceding the eieio instruction have been performed with respect to main memory or 
coherent memory as appropriate. 
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The eieio instruction is intended for use in managing shared data structures, in accessing 
memory-mapped I/O, and in preventing load/store combining operations in main memory. 
For the first use, the shared data structure and the lock that protects it must be altered only 
by stores that are in the same set (1 or 2; see previous discussion). For the second use, eieio 
can be thought of as placing a barrier into the stream of memory accesses issued by a 
processor, such that any given memory access appears to be on the same side of the barrier 
to both the processor and the I/O device. 


Because the processor performs store operations in order to memory that is designated as 
both caching-inhibited and guarded (refer to Section 5.1.1, “Memory Access Ordering”), 
the eieio instruction is needed for such memory only when loads must be ordered with 
respect to stores or with respect to other loads. 


Note that the eieio instruction does not connect hardware considerations to it such as 
multiprocessor implementations that send an eieio address-only broadcast (useful in some 
designs). For example, if a design has an external buffer that re-orders loads and stores for 
better bus efficiency, the eieio broadcast signals to that buffer that previous loads/stores 
(marked caching-inhibited, guarded, or write-through required) must complete before any 
following loads/stores (marked caching-inhibited, guarded, or write-through required). 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





VEA X 




















8-62 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


eqvx eqvx 


Equivalent 


eqv rA,rs,rB (Rc = 0) 
eqv. rA,rs,rB (Re = 1) 


a a a 
0 5 6 10 11 15 16 21 22 30 31 


rA © (rS) = (xB) 
The contents of rS are XORed with the contents of rB and the complemented result is 
placed into rA. 
Other registers altered: 


* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 
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extsbx extsbx 


Extend Sign Byte 





extsb rA,rs (Rc = 0) 
extsb. rA,rs (Re = 1) 
[_] Reserved 
31 Ss A 00000 954 Re 
0 5 6 10 11 15 16 20 21 30 31 
S © rS[24] 


rA[24-31] < rS[24-31] 
rA[0-23] <— (24)S 


The contents of rS[24—31] are placed into rA[24—31]. Bit 24 of rS is placed into rA[0—23]. 


Other registers altered: 
* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 




















8-64 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


extshx extshx 


Extend Sign Half Word 


extsh rA,rs (Rc = 0) 
extsh. rA,rs (Rc= 1) 


[POWER mnemonics: exts, exts.] 


[_] Reserved 





31 Ss A 00000 922 Re 
0 5 6 10 11 15 16 20 21 30 31 


S © rS[16] 
rA[16-31] <— rS[16-31] 
rA[0-15] < (16)S 


The contents of rS[16—31] are placed into rA[16—31]. Bit 16 of rS is placed into rA[0-15]. 


Other registers altered: 
* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 
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fabsx fabsx 


Floating Absolute Value 


fabs frD,frB (Rc = 0) 
fabs. frD,frB (Rc = 1) 

[_] Reserved 
[se [© [eo] se | = (A 
0 5 6 10 11 15 16 20 21 30 31 


The contents of frB with bit 0 cleared are placed into frD. 


Note that the fabs instruction treats NaNs just like any other kind of value. That is, the sign 
bit of a NaN may be altered by fabs. This instruction does not alter the FPSCR. 


Other registers altered: 
* Condition Register (CR1 field): 





Affected: FX, FEX, VX, OX Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 




















8-66 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


faddx faddx 


Floating Add (Double-Precision) 


fadd frD,frA,frB (Rc = 0) 
fadd. frD,frA,frB (Rc = 1) 


[POWER mnemonics: fa, fa.] 








[_] Reserved 
63 D A B 00000 21 Re 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The floating-point operand in frA is added to the floating-point operand in frB. If the most- 
significant bit of the resultant significand is not a one, the result is normalized. The result 
is rounded to double-precision under control of the floating-point rounding control field RN 
of the FPSCR and placed into frD. 


Floating-point addition is based on exponent comparison and addition of the two 
significands. The exponents of the two operands are compared, and the significand 
accompanying the smaller exponent is shifted right, with its exponent increased by one for 
each bit shifted, until the two exponents are equal. The two significands are then added or 
subtracted as appropriate, depending on the signs of the operands. All 53 bits in the 
significand as well as all three guard bits (G, R, and X) enter into the computation. 


If a carry occurs, the sum's significand is shifted right one bit position and the exponent is 
increased by one. FPSCR[FPRF] is set to the class and sign of the result, except for invalid 
operation exceptions when FPSCR[VE] = 1. 
Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX (if Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA A 
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faddsx faddsx 


Floating Add Single 





fadds frD,frA,frB (Rc = 0) 
fadds. frD,frA,frB (Rc = 1) 
[_] Reserved 
59 D A B 000 00 21 Re 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The floating-point operand in frA is added to the floating-point operand in frB. If the most- 
significant bit of the resultant significand is not a one, the result is normalized. The result 
is rounded to the single-precision under control of the floating-point rounding control field 
RN of the FPSCR and placed into frD. 


Floating-point addition is based on exponent comparison and addition of the two 
significands. The exponents of the two operands are compared, and the significand 
accompanying the smaller exponent is shifted right, with its exponent increased by one for 
each bit shifted, until the two exponents are equal. The two significands are then added or 
subtracted as appropriate, depending on the signs of the operands. All 53 bits in the 
significand as well as all three guard bits (G, R, and X) enter into the computation. 


If a carry occurs, the sum's significand is shifted right one bit position and the exponent is 
increased by one. FPSCR[FPRF] is set to the class and sign of the result, except for invalid 
operation exceptions when FPSCR[VE] = 1. 
Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX (if Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA A 























8-68 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fcmpo 


Floating Compare Ordered 


fcmpo 





fempo crfD,frA,frB 
[_] Reserved 
63 crfD 00 A B 32 0 
0 5 6 8 9 10 11 15 16 20 21 30 31 


if (frA) is a NaN or 

(frB) isa NaN then c< Ob0001 
else if (frA)< (frB) then c — 0b1000 
else if (frA)> (frB) then c — 0b0100 
else c — 0b0010 


FPCC < c 
CR[4 * erfD-4 * erfD + 3] < c 


if (frA) is an SNaN or 
(frB) is an SNaN then 
VXSNAN € 1 
if VE=0 then VXVC ¢ 1 
else if (frA) is a QNaN or 
(frB) is a QNaN then VXVC < 1 


The floating-point operand in frA is compared to the floating-point operand in frB. The 
result of the compare is placed into CR field erfD and the FPCC. 


If one of the operands is a NaN, either quiet or signaling, then CR field erfD and the FPCC 
are Set to reflect unordered. If one of the operands is a signaling NaN, then VXSNAN is set, 
and if invalid operation is disabled (VE = 0) then VXVC is set. Otherwise, if one of the 


operands is a QNaN, then VXVC is set. 


Other registers altered: 


* Condition Register (CR field specified by operand erfD): 


Affected: LT, GT, EQ, UN 
¢ Floating-Point Status and Control Register: 
Affected: FPCC, FX, VXSNAN, VXVC 


PowerPC Architecture Level Supervisor Level 


Optional 


Form 





UISA 














X 
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fempu fempu 


Floating Compare Unordered 


fempu crfD,frA,frB 

[_] Reserved 
[foo fol [© | _covosonoee [ey 
0 5 6 8 9 10 11 15 16 20 21 30 31 


if (frA) is a NaN or 

(frB) isaNaN then c< Ob0001 
else if (frA) < (frB) thenc <— 0b1000 
else if (frA) > (frB) thenc <— 0b0100 
else c — 0b0010 


FPCC < c 
CR[4 * erfD-4 * crfD + 3] cc 


if (frA) is an SNaN or 
(frB) is an SNaN then 
VXSNAN € 1 


The floating-point operand in register frA is compared to the floating-point operand in 
register frB. The result of the compare is placed into CR field erfD and the FPCC. 


If one of the operands is a NaN, either quiet or signaling, then CR field erfD and the FPCC 
are set to reflect unordered. If one of the operands is a signaling NaN, then VXSNAN is set. 


Other registers altered: 
¢ Condition Register (CR field specified by operand erfD): 
Affected: LT, GT, EQ, UN 
¢ Floating-Point Status and Control Register: 
Affected: FPCC, FX, VXSNAN 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















8-70 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fctiwx fctiwx 


Floating Convert to Integer Word 





fctiw frD,frB (Rc = 0) 
fctiw. frD,frB (Rc = 1) 
[_] Reserved 
63 D 00000 B 14 Re 
0 5 6 10 11 15 16 20 21 30 31 


The floating-point operand in register frB is converted to a 32-bit signed integer, using the 
rounding mode specified by FPSCR[RN], and placed in bits 32-63 of frD. Bits 0-31 of frD 
are undefined. 


If the operand in frB are greater than 23! _ 1, bits 32-63 of frD are set to 0x7FFF_FFFF. 
If the operand in frB are less than ay? _ bits 32—63 of frD are set to 0x8000_0000. 


The conversion is described fully in Section D.4.2, “Floating-Point Convert to Integer 
Model.” 


Except for trap-enabled invalid operation exceptions, FPSCR[FPRF] is undefined. 
FPSCR[FR] is set if the result is incremented when rounded. FPSCR[FI] is set if the result 
is inexact. 
Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX (if Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF (undefined), FR, FI, FX, XX, VXSNAN, VXCVI 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 
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fctiwzx fctiwzx 


Floating Convert to Integer Word with Round toward Zero 


fctiwz frD,frB (Rc = 0) 
fctiwz. frD,frB (Re = 1) 

[_] Reserved 
[ie |» [ow] es | « (e 
0 5 6 10 11 15 16 20 21 30 31 


The floating-point operand in register frB is converted to a 32-bit signed integer, using the 
rounding mode round toward zero, and placed in bits 32-63 of frD. Bits 0-31 of frD are 
undefined. 


If the operand in frB is greater than 23! _ 1, bits 32-63 of frD are set to 0x7FFF_FFFF. 
If the operand in frB is less than 9? - bits 32—63 of frD are set to 0x 8000_0000. 


The conversion is described fully in Section D.4.2, “Floating-Point Convert to Integer 
Model.” 


Except for trap-enabled invalid operation exceptions, FPSCR[FPRF] is undefined. 
FPSCR[FR] is set if the result is incremented when rounded. FPSCR[FI] is set if the result 
is inexact. 
Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF (undefined), FR, FI, FX, XX, VXSNAN, VXCVI 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















8-72 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fdivx fdivx 


Floating Divide (Double-Precision) 

fdiv frD,frA,frB (Re = 0) 
fdiv. frD,frA,frB (Rc = 1) 
[POWER mnemonics: fd, fd.] 





[_] Reserved 
63 D A B 00000 18 Re 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The floating-point operand in register frA is divided by the floating-point operand in 
register frB. The remainder is not supplied as a result. 


If the most-significant bit of the resultant significand is not a one, the result is normalized. 
The result is rounded to double-precision under control of the floating-point rounding 
control field RN of the FPSCR and placed into frD. 


Floating-point division is based on exponent subtraction and division of the significands. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1 and zero divide exceptions when FPSCR[ZE] = 1. 


Other registers altered: 
¢ Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, ZX, XX, VXSNAN, VXIDI, VXZDZ 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA A 
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fdivsx fdivsx 


Floating Divide Single 


fdivs frD,frA,frB (Rc = 0) 
fdivs. frD,frA,frB (Rc = 1) 
[_] Reserved 
po | oe | « | «(is 
10 11 15 16 25 26 30 31 


The floating-point operand in register frA is divided by the floating-point operand in 
register frB. The remainder is not supplied as a result. 


If the most-significant bit of the resultant significand is not a one, the result is normalized. 
The result is rounded to single-precision under control of the floating-point rounding 
control field RN of the FPSCR and placed into frD. 


Floating-point division is based on exponent subtraction and division of the significands. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1 and zero divide exceptions when FPSCR[ZE] = 1. 


Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, ZX, XX, VXSNAN, VXIDI, VXZDZ 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA A 




















8-74 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fmaddx fmaddx 


Floating Multiply-Add (Double-Precision) 


fmadd frD,frA,frC,frB (Rc = 0) 
fmadd. frD,frA,frC,frB (Rc=1) 


[POWER mnemonics: fma, fma.] 





0 5 6 10 11 15 16 20 21 25 26 30 31 
The following operation is performed: 


frD <¢ (frA * frC) + £rB 
The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is added to this intermediate result. 


If the most-significant bit of the resultant significand is not a one, the result is normalized. 
The result is rounded to double-precision under control of the floating-point rounding 
control field RN of the FPSCR and placed into frD. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA A 
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fmaddsx fmadds«x 


Floating Multiply-Add Single 





fmadds frD,frA,frC,frB (Rce=0) 
fmadds. frD,frA,frC,frB (Rc = 1) 

59 D A B Cc 29 Re 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The following operation is performed: 


frD < (frA * frC) + £rB 
The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is added to this intermediate result. 


If the most-significant bit of the resultant significand is not a one, the result is normalized. 
The result is rounded to single-precision under control of the floating-point rounding 
control field RN of the FPSCR and placed into frD. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 
Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA A 























8-76 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fmrx fmrx 


Floating Move Register 


fmr frD,frB (Rc = 0) 
fmr. frD,frB (Re = 1) 

[_] Reserved 
[= |» [owe | se | 2 |e 
0 5 6 10 11 15 16 20 21 30 31 


The contents of register frB are placed into frD. 


Other registers altered: 
¢ Condition Register (CR1 field): 





Affected: FX, FEX, VX, OX Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 
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fmsub x fmsub x 


Floating Multiply-Subtract (Double-Precision) 

fmsub frD,frA,frC,frB (Re = 0) 
fmsub. frD,frA,frC,frB (Rc=1) 
[POWER mnemonics: fms, fms.] 


10 11 15 16 25 26 30 31 
The following operation is performed: 


frD < [frA* frC] - frB 
The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is subtracted from this intermediate 
result. 


If the most-significant bit of the resultant significand is not a one, the result is normalized. 
The result is rounded to double-precision under control of the floating-point rounding 
control field RN of the FPSCR and placed into frD. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA A 




















8-78 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fmsubs x fmsubsx 


Floating Multiply-Subtract Single 


fmsubs frD,frA,frC,frB (Rc = 0) 
fmsubs. frD,frA,frC,frB (Rc=1) 
10 11 15 16 25 26 30 31 


The following operation is performed: 


frD < [frA * frC] - frB 
The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is subtracted from this intermediate 
result. 


If the most-significant bit of the resultant significand is not a one, the result is normalized. 
The result is rounded to single-precision under control of the floating-point rounding 
control field RN of the FPSCR and placed into frD. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA A 
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fmulx fmulx 


Floating Multiply (Double-Precision) 


fmul frD,frA,frC (Rc = 0) 
fmul. frD,frA,frC (Rc = 1) 


[POWER mnemonics: fm, fm.] 








[_] Reserved 
63 D A 00000 Cc 25 Re 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. 


If the most-significant bit of the resultant significand is not a one, the result is normalized. 
The result is rounded to double-precision under control of the floating-point rounding 
control field RN of the FPSCR and placed into frD. 


Floating-point multiplication is based on exponent addition and multiplication of the 
significands. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 
Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXIMZ 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA A 




















8-80 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fmulsx fmulsx 


Floating Multiply Single 





fmuls frD,frA,frC (Rc = 0) 
fmuls. frD,frA,frC (Re = 1) 
[_] Reserved 
59 D A 00000 Cc 25 Re 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. 


If the most-significant bit of the resultant significand is not a one, the result is normalized. 
The result is rounded to single-precision under control of the floating-point rounding 
control field RN of the FPSCR and placed into frD. 


Floating-point multiplication is based on exponent addition and multiplication of the 
significands. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXIMZ 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA A 




















Chapter 8. Instruction Set 8-81 


fnabsx fnabsx 


Floating Negative Absolute Value 





fnabs frD,frB (Rc = 0) 
fnabs. frD,frB (Re = 1) 
[_] Reserved 
63 D 0 0000 B 136 Re 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The contents of register frB with bit 0 set are placed into frD. 


Note that the fnabs instruction treats NaNs just like any other kind of value. That is, the 
sign bit of a NaN may be altered by fnabs. This instruction does not alter the FPSCR. 


Other registers altered: 
* Condition Register (CR1 field): 





Affected: FX, FEX, VX, OX Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 




















8-82 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fnegx fnegx 


Floating Negate 


fneg frD,frB (Rc = 0) 
fneg. frD,frB (Re = 1) 

[_] Reserved 
[se [ ° [ew] =» | « 
0 5 6 10 11 15 16 20 21 30 31 


The contents of register frB with bit 0 inverted are placed into frD. 


Note that the fneg instruction treats NaNs just like any other kind of value. That is, the sign 
bit of a NaN may be altered by fneg. This instruction does not alter the FPSCR. 


Other registers altered: 
¢ Condition Register (CR1 field): 





Affected: FX, FEX, VX, OX Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 




















Chapter 8. Instruction Set 8-83 


fnmaddx fnmaddx 


Floating Negative Multiply-Add (Double-Precision) 


fnmadd frD,frA,frC,frB (Rc = 0) 
fnmadd. frD,frA,frC,frB (Re = 1) 


[POWER mnemonics: fnma, fnma.] 





0 5 6 10 11 15 16 20 21 25 26 30 31 
The following operation is performed: 


frD < - ([frA * frC] + £frB) 
The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is added to this intermediate result. 
If the most-significant bit of the resultant significand is not a one, the result is normalized. 
The result is rounded to double-precision under control of the floating-point rounding 
control field RN of the FPSCR, then negated and placed into frD. 


This instruction produces the same result as would be obtained by using the Floating 
Multiply-Add (fmaddx) instruction and then negating the result, with the following 
exceptions: 


¢ QNaNs propagate with no effect on their sign bit. 


¢ QNaNs that are generated as the result of a disabled invalid operation exception have 
a sign bit of zero. 


e SNaNs that are converted to QNaNs as the result of a disabled invalid operation 
exception retain the sign bit of the SNaN. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA A 




















8-84 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fnmadds«x fnmadds«x 


Floating Negative Multiply-Add Single 





fnmadds frD,frA,frC,frB (Rc = 0) 
fnmadds. frD,frA,frC,frB (Rc = 1) 

59 D A B Cc 31 Re 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The following operation is performed: 


frD <« - ([frA * frC] + £rB) 
The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is added to this intermediate result. 
If the most-significant bit of the resultant significand is not a one, the result is normalized. 
The result is rounded to single-precision under control of the floating-point rounding 
control field RN of the FPSCR, then negated and placed into frD. 


This instruction produces the same result as would be obtained by using the Floating 
Multiply-Add Single (fmaddsx) instruction and then negating the result, with the following 
exceptions: 


¢ QNaNs propagate with no effect on their sign bit. 


¢ QNaNs that are generated as the result of a disabled invalid operation exception have 
a sign bit of zero. 


e SNaNs that are converted to QNaNs as the result of a disabled invalid operation 
exception retain the sign bit of the SNaN. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 
Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA A 























Chapter 8. Instruction Set 8-85 


fnmsub«x fnmsub«x 


Floating Negative Multiply-Subtract (Double-Precision) 


fnmsub frD,frA,frC,frB (Rc = 0) 
fnmsub. frD,frA,frC,frB (Re = 1) 
[POWER mnemonics: fnms, fnms.] 





0 5 6 10 11 15 16 20 21 25 26 30 31 
The following operation is performed: 


frD <« - ([frA * frC] - frB) 
The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is subtracted from this intermediate 
result. 


If the most-significant bit of the resultant significand is not one, the result is normalized. 
The result is rounded to double-precision under control of the floating-point rounding 
control field RN of the FPSCR, then negated and placed into frD. 


This instruction produces the same result obtained by negating the result of a Floating 
Multiply-Subtract (fmsubx) instruction with the following exceptions: 
* QNaNs propagate with no effect on their sign bit. 


* QNaNs that are generated as the result of a disabled invalid operation exception have 
a sign bit of zero. 


e SNaNs that are converted to QNaNs as the result of a disabled invalid operation 
exception retain the sign bit of the SNaN. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


Other registers altered: 
* Condition Register (CR1 field) 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA A 




















8-86 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fnmsubs«x fnmsubs«x 


Floating Negative Multiply-Subtract Single 


fnmsubs frD,frA,frC,frB (Rc = 0) 
fnmsubs. frD,frA,frC,frB (Rc = 1) 
10 11 15 16 25 26 30 31 


The following operation is performed: 


frD <« - ([frA * frC] - f£rB) 
The floating-point operand in register frA is multiplied by the floating-point operand in 
register frC. The floating-point operand in register frB is subtracted from this intermediate 
result. 


If the most-significant bit of the resultant significand is not one, the result is normalized. 
The result is rounded to single-precision under control of the floating-point rounding 
control field RN of the FPSCR, then negated and placed into frD. 


This instruction produces the same result obtained by negating the result of a Floating 
Multiply-Subtract Single (fmsubsx) instruction with the following exceptions: 
¢ QNaNs propagate with no effect on their sign bit. 


* QNaNs that are generated as the result of a disabled invalid operation exception have 
a sign bit of zero. 


e SNaNs that are converted to QNaNs as the result of a disabled invalid operation 
exception retain the sign bit of the SNaN. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


Other registers altered: 
* Condition Register (CR1 field) 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI, VXIMZ 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA A 




















Chapter 8. Instruction Set 8-87 


fresx fresx 


Floating Reciprocal Estimate Single 


fres frD,frB (Rc = 0) 
fres. frD,frB (Rc = 1) 
[_] Reserved 
A 
10 11 15 16 20 21 25 26 30 31 


A single-precision estimate of the reciprocal of the floating-point operand in register frB is 
placed into register frD. The estimate placed into register frD is correct to a precision of 
one part in 256 of the reciprocal of frB. That is, 


estimate-(*) 

x 1 

a 
1 256 
6 


where x is the initial value in frB. Note that the value placed into register frD may vary 
between implementations, and between different executions on the same implementation. 


ABS 


Operation with various special values of the operand is summarized below: 


Operand Result Exception 
00 -0 None 

-0 —oo* ZX 

+0 too% ZX 

oo +0 None 
SNaN QNaN** VXSNAN 
QNaN QNaN None 


Notes: * No result if FPSCR[ZE] = 1 
** No result if FRSCR[VE] = | 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1 and zero divide exceptions when FPSCR[ZE] = 1. 


Note that the PowerPC architecture makes no provision for a double-precision version of 
the fresx instruction. This is because graphics applications are expected to need only the 
single-precision version, and no other important performance-critical applications are 
expected to require a double-precision version of the fresx instruction. 


This instruction is optional in the PowerPC architecture. 
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Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX (if Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR (undefined), FI (undefined), FX, OX, UX, ZX, VXSNAN 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA V A 























Chapter 8. Instruction Set 


frspx frspx 


Floating Round to Single 


frsp frD,frB (Rc = 0) 
frsp. frD,frB (Re = 1) 

[_] Reserved 
[= [» [ooo] = | |e 
0 5 6 10 11 15 16 20 21 30 31 


The floating-point operand in register frB is rounded to single-precision using the rounding 
mode specified by FPSCR[RN] and placed into frD. 


The rounding is described fully in Section D.4.1, “Floating-Point Round to Single- 
Precision Model.” 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA X 























8-90 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


frsqrtex frsqrtex 


Floating Reciprocal Square Root Estimate 





frsqrte frD,frB (Rc = 0) 
frsqrte. frD,frB (Rc = 1) 
[_] Reserved 
63 D 00000 B 00000 26 Re 
0 5 6 10 11 15 16 20 21 25 26 30 31 


A double-precision estimate of the reciprocal of the square root of the floating-point 
operand in register frB is placed into register frD. The estimate placed into register frD is 
correct to a precision of one part in 32 of the reciprocal of the square root of frB. That is, 


1 
estimare-{ +) 
Vx] 21 


G) J 


ABS 


where x is the initial value in frB. Note that the value placed into register frD may vary 
between implementations, and between different executions on the same implementation. 


Operation with various special values of the operand is summarized below: 


Operand Result Exception 
—00 QNaN** VXSQRT 
<0 QNaN** VXSQRT 
-0 oo ZX 

+0 00% ZX 

boo +0 None 
SNaN QNaN** VXSNAN 
QNaN QNaN None 


Notes: * No result if FPSCR[ZE] = 1 
** No result if FPSCR[VE] = | 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1 and zero divide exceptions when FPSCR[ZE] = 1. 


Note that no single-precision version of the frsqrte instruction is provided; however, both 
frB and frD are representable in single-precision format. 


This instruction is optional in the PowerPC architecture. 
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Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX (if Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR (undefined), FI (undefined), FX, ZX, VXSNAN, VXSQRT 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA V A 























8-92 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fselx fselx 


Floating Select 





fsel frD,frA,frC,frB (Rc = 0) 
fsel. frD,frA,frC,frB (Re = 1) 
63 D A B Cc 23 Re 
0 5 6 10 11 15 16 20 21 25 26 30 31 


if (frA) = 0.0 then frD<¢ (frC) 
else frD<¢ (frB) 


The floating-point operand in register frA is compared to the value zero. If the operand is 
greater than or equal to zero, register frD is set to the contents of register frC. If the operand 
is less than zero or is a NaN, register frD is set to the contents of register frB. The 
comparison ignores the sign of zero (that is, regards +0 as equal to —0). 


Care must be taken in using fsel if IEEE compatibility is required, or if the values being 
tested can be NaNs or infinities. 


For examples of uses of this instruction, see Section D.3, “Floating-Point Conversions,” 
and Section D.5, “Floating-Point Selection.” 


This instruction is optional in the PowerPC architecture. 


Other registers altered: 
* Condition Register (CR1 field): 





Affected: FX, FEX, VX, OX Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA V A 




















Chapter 8. Instruction Set 8-93 


fsqrtx fsqrtx 


Floating Square Root (Double-Precision) 


fsqrt frD,frB (Rc = 0) 
fsqrt. frD,frB (Re = 1) 
[_] Reserved 
a 
10 11 15 16 20 21 25 26 30 31 


The square root of the floating-point operand in register frB is placed into register frD. 


If the most-significant bit of the resultant significand is not a one the result is normalized. 
The result is rounded to the target precision under control of the floating-point rounding 
control field RN of the FPSCR and placed into register frD. 


Operation with various special values of the operand is summarized below: 


Operand Result Exception 
—00 QNaN* VXSQRT 
<0 QNaN* VXSQRT 
-0 -0 None 
+400 +00 None 
SNaN QNaN* VXSNAN 
QNaN QNaN None 


Notes: * No result if FPSCR[VE] = 1 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


This instruction is optional in the PowerPC architecture. 
Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, XX, VXSNAN, VXSQRT 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA V A 




















8-94 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fsqrtsx fsqrtsx 


Floating Square Root Single 





fsqrts frD,frB (Rc = 0) 
fsqrts. frD,frB (Re = 1) 
[_] Reserved 
59 D 00000 B 00000 22 Re 
0 5 6 10 11 15 16 20 21 25 26 30 31 


The square root of the floating-point operand in register frB is placed into register frD. 


If the most-significant bit of the resultant significand is not a one the result is normalized. 
The result is rounded to the target precision under control of the floating-point rounding 
control field RN of the FPSCR and placed into register frD. 


Operation with various special values of the operand is summarized below. 


Operand Result Exception 
—0o QNaN* VXSQRT 
<0 QNaN* VXSQRT 
-0 -0 None 
+00 +00 None 
SNaN QNaN* VXSNAN 
QNaN QNaN None 


Notes: * No result if FPSCR[VE] = 1 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


This instruction is optional in the PowerPC architecture. 


Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, XX, VXSNAN, VXSQRT 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA V A 




















Chapter 8. Instruction Set 8-95 


fsubx fsubx 


Floating Subtract (Double-Precision) 


fsub frD,frA,frB (Rc = 0) 
fsub. frD,frA,frB (Rc = 1) 
[POWER mnemonics: fs, fs.] 


[_] Reserved 


p= | ie | « | _« io. 


10 11 15 16 20 21 25 26 30 31 


The floating-point operand in register frB is subtracted from the floating-point operand in 
register frA. If the most-significant bit of the resultant significand is not a one, the result is 
normalized. The result is rounded to double-precision under control of the floating-point 
rounding control field RN of the FPSCR and placed into frD. 


The execution of the fsub instruction is identical to that of fadd, except that the contents of 
frB participate in the operation with its sign bit (bit 0) inverted. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI 


PowerPC Architecture Level Supervisor Level Optional Form 








UISA A 

















8-96 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


fsubsx fsubsx 


Floating Subtract Single 


fsubs frD,frA,frB (Rc = 0) 
fsubs. frD,frA,frB (Rc = 1) 
[_] Reserved 
pee A A 
10 11 15 16 20 21 25 26 30 31 


The floating-point operand in register frB is subtracted from the floating-point operand in 
register frA. If the most-significant bit of the resultant significand is not a one, the result is 
normalized. The result is rounded to single-precision under control of the floating-point 
rounding control field RN of the FPSCR and placed into frD. 


The execution of the fsubs instruction is identical to that of fadds, except that the contents 
of frB participate in the operation with its sign bit (bit 0) inverted. 


FPSCR[FPRF] is set to the class and sign of the result, except for invalid operation 
exceptions when FPSCR[VE] = 1. 


Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPRF, FR, FI, FX, OX, UX, XX, VXSNAN, VXISI 


PowerPC Architecture Level Supervisor Level Optional Form 








UISA A 

















Chapter 8. Instruction Set 8-97 


icbi icbi 


Instruction Cache Block Invalidate 





icbi rA,rB 
[_] Reserved 
31 00 000 A B 982 0 
0 5 6 10 11 15 16 20 21 30 31 
EA is the sum (rAl0) + (rB). 


If the block containing the byte addressed by EA is in coherency-required mode, and a 
block containing the byte addressed by EA is in the instruction cache of any processor, the 
block is made invalid in all such instruction caches, so that subsequent references cause the 
block to be refetched. 


If the block containing the byte addressed by EA is in coherency-not-required mode, and a 
block containing the byte addressed by EA is in the instruction cache of this processor, the 
block is made invalid in that instruction cache, so that subsequent references cause the 
block to be refetched. 


The function of this instruction is independent of the write-through, write-back, and 
caching-inhibited/allowed modes of the block containing the byte addressed by EA. 


This instruction is treated as a load from the addressed byte with respect to address 
translation and memory protection. It may also be treated as a load for referenced and 
changed bit recording except that referenced and changed bit recording may not occur. 
Implementations with a combined data and instruction cache treat the icbi instruction as a 
no-op, except that they may invalidate the target block in the instruction caches of other 
processors if the block is in coherency-required mode. 


The icbi instruction invalidates the block at EA (rAl0 + rB). If the processor is a 
multiprocessor implementation (for example, the 601, 604, or 620) and the block is marked 
coherency-required, the processor will send an address-only broadcast to other processors 
causing those processors to invalidate the block from their instruction caches. 


For faster processing, many implementations will not compare the entire EA (rAl0 + rB) 
with the tag in the instruction cache. Instead, they will use the bits in the EA to locate the 
set that the block is in, and invalidate all blocks in that set. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





VEA X 




















8-98 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


isync isync 


Instruction Synchronize 
isync 


[POWER mnemonic: ics] 


[_] Reserved 





19 00 000 00000 00000 150 0 
0 5 6 10 11 15 16 20 21 30 31 


The isyne instruction provides an ordering function for the effects of all instructions 
executed by a processor. Executing an isyne instruction ensures that all instructions 
preceding the isync instruction have completed before the isyne instruction completes, 
except that memory accesses caused by those instructions need not have been performed 
with respect to other processors and mechanisms. It also ensures that no subsequent 
instructions are initiated by the processor until after the isyne instruction completes. 
Finally, it causes the processor to discard any prefetched instructions, with the effect that 
subsequent instructions will be fetched and executed in the context established by the 
instructions preceding the isync instruction. The isync instruction has no effect on the other 
processors or on their caches. 


This instruction is context synchronizing. 


Context synchronization is necessary after certain code sequences that perform complex 
operations within the processor. These code sequences are usually operating system tasks 
that involve memory management. For example, if an instruction A changes the memory 
translation rules in the memory management unit (MMU), the isyne instruction should be 
executed so that the instructions following instruction A will be discarded from the pipeline 
and refetched according to the new translation rules. 


Note that all exceptions and the rfi instruction are also context synchronizing. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





VEA XL 




















Chapter 8. Instruction Set 8-99 


Ibz Ibz 


Load Byte and Zero 


Ibz rD,d(rA) 
a a a ae a | 
0 5 6 10 11 15 16 31 


if rA = 0 then b ¢ 0 
else b € (rA) 

EA < b + EXTS(d) 

rD ¢ (24)0 || MEM(EA, 1) 


EA is the sum (rAl0) + d. The byte in memory addressed by EA is loaded into the low-order 
eight bits of rD. The remaining bits in rD are cleared. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















8-100 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


Ibzu Ibzu 


Load Byte and Zero with Update 

















Ibzu rD,d(rA) 
35 D A d 
0 5 6 10 11 15 16 31 
EBA < (rA) + EXTS(d) 
rD< (24)0 || EM(EA, 1) 
rA¢c EA 


EA is the sum (rA) + d. The byte in memory addressed by EA is loaded into the low-order 
eight bits of rD. The remaining bits in rD are cleared. 


EA is placed into rA. 


If rA = 0, or rA = rD, the instruction form is invalid. 
Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















Chapter 8. Instruction Set 8-101 


Ibzux Ibzux 


Load Byte and Zero with Update Indexed 





Ibzux rD,rA,rB 
[_] Reserved 
31 D A B 119 0 
0 5 6 10 11 15 16 20 21 30 31 


EA < (rA) + (xB) 
rD ¢ (24)0 || MEM(EA, 1) 
rA ¢ EA 


EA is the sum (rA) + (rB). The byte in memory addressed by EA is loaded into the low- 
order eight bits of rD. The remaining bits in rD are cleared. 


EA is placed into rA. 


If rA =O orrA =rbD, the instruction form is invalid. 
Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















8-102 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


Ibzx Ibzx 


Load Byte and Zero Indexed 





Ibzx rD,rA,rB 
[_] Reserved 
31 D A B 87 0 
0 5 6 10 11 15 16 20 21 30 31 


if rA = 0 then b¢ 0 
else b € (rA) 

EA ¢< b + (xB) 

rD ¢ (24)0 || MEM(EA, 1) 


EA is the sum (rAl0) + (rB). The byte in memory addressed by EA is loaded into the low- 
order eight bits of rD. The remaining bits in rD are cleared. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















Chapter 8. Instruction Set 8-103 


lfd 


lfd 

















Load Floating-Point Double 
Ifd frD,d(rA) 
50 D A d 
0 5 6 10 11 15 16 31 
if rA = 0 then b ¢€ 0 
else b © (rxA) 
EA < b + EXTS(d) 
frD < MEM(EA, 8) 
EA is the sum (rAlQ) + d. 


The double word in memory addressed by EA is placed into frD. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 

















UISA D 








8-104 


PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


Ifdu Ifdu 


Load Floating-Point Double with Update 
Ifdu frD,d(rA) 





0 5 6 10 11 15 16 31 


EA € (rA) + EXTS(d) 
frD < MEM(EA, 8) 
rA ¢ EA 


EA is the sum (rA) + d. 














The double word in memory addressed by EA is placed into frD. 
EA is placed into rA. 


If rA = 0, the instruction form is invalid. 
Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















Chapter 8. Instruction Set 8-105 


lfdux lfdux 


Load Floating-Point Double with Update Indexed 


Ifdux frD,rA,rB 

[_] Reserved 
Ee 631 jo 
0 5 6 10 11 15 16 20 21 30 31 


EA < (rA) + (xB) 
frD <-MEM(EA, 8) 
rA + EA 


EA is the sum (rA) + (rB). 
The double word in memory addressed by EA is placed into frD. 
EA is placed into rA. 


If rA = 0, the instruction form is invalid. 
Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 








UISA X 
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lfdx 


Load Floating-Point Double Indexed 


lfdx 









































Ifdx frD,rA,rB 
[_] Reserved 
31 D A B 599 0 
0 5 6 10 11 15 16 20 21 30 31 
if rA = 0 then b ¢€ 0 
else b © (rA) 
EA <b + (xB) 
frD © MEM(EA, 8) 
EA is the sum (rAl0) + (rB). 
The double word in memory addressed by EA is placed into frD. 
Other registers altered: 
e None 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA xX 
8-107 


Chapter 8. Instruction Set 


Ifs Ifs 


Load Floating-Point Single 
Ifs frD,d(rA) 





0 5 6 10 114 15 16 31 


if rA = 0 then b¢ 0 
else b € (rA) 

EA ¢< b + EXTS(d) 

frD < DOUBLE (MEM(EA, 4) ) 


EA is the sum (rAl0) + d. 


The word in memory addressed by EA is interpreted as a floating-point single-precision 
operand. This word is converted to floating-point double-precision (see Section D.6, 
“Floating-Point Load Instructions”) and placed into frD. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 
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lfsu lfsu 


Load Floating-Point Single with Update 
Ifsu frD,d(rA) 





0 5 6 10 11 15 16 31 
EA < (rA) + EXTS(d) 


frD < DOUBLE (MEM(EA, 4) ) 
rA < EA 


EA is the sum (rA) +d. 


The word in memory addressed by EA is interpreted as a floating-point single-precision 
operand. This word is converted to floating-point double-precision (see Section D.6, 
“Floating-Point Load Instructions”) and placed into frD. 


EA is placed into rA. 


If rA = 0, the instruction form is invalid. 
Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















Chapter 8. Instruction Set 8-109 


lfsux Ifsux 


Load Floating-Point Single with Update Indexed 








Ifsux frD,rA,rB 
[_] Reserved 
31 D A B 567 0 
0 5 6 10 11 15 16 20 21 30 31 


EA < (rA) + (rB) 
frD < DOUBLE (MEM(EA, 4) ) 
ra ¢ EA 


EA is the sum (rA) + (rB). 


The word in memory addressed by EA is interpreted as a floating-point single-precision 
operand. This word is converted to floating-point double-precision (see Section D.6, 
“Floating-Point Load Instructions”) and placed into frD. 


EA is placed into rA. 


If rA = 0, the instruction form is invalid. 
Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 
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lfsx 


Load Floating-Point Single Indexed 


lfsx 








Ifsx frD,rA,rB 
[_] Reserved 
31 D A B 535 0 
0 5 6 10 11 15 16 20 21 30 31 


if rA = 0 then b ¢ 0 
else b € (rA) 

EA <b + (xB) 

frD < DOUBLE (MEM(EA, 4) ) 


EA is the sum (rAl0) + (rB). 


The word in memory addressed by EA is interpreted as a floating-point single-precision 
operand. This word is converted to floating-point double-precision (see Section D.6, 
“Floating-Point Load Instructions”) and placed into frD. 


Other registers altered: 


e None 


PowerPC Architecture Level 


Supervisor Level 


Optional 


Form 





UISA 














X 








Chapter 8. Instruction Set 


8-114 


lha lha 


Load Half Word Algebraic 


lha rD,d(rA) 
a eae ee a eee 
0 5 6 10 11 15 16 31 


if rA = 0 then b¢ 0 
else b € (rA) 

EA ¢< b + EXTS(d) 

rD ¢ EXTS (MEM(EA, 2) ) 


EA is the sum (rAl0) + d. The half word in memory addressed by EA is loaded into the low- 
order 16 bits of rD. The remaining bits in rD are filled with a copy of the most-significant 
bit of the loaded half word. 

Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 











UISA D 
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Ihau 


Load Half Word Algebraic with Update 


hau rD,d(rA) 


Ihau 





0 5 6 10 11 


EA < (rA) + EXTS(d) 
rD < EXTS (MEM(EA, 2) ) 
rA < EA 


15 16 


31 


EA is the sum (rA) + d. The half word in memory addressed by EA is loaded into the low- 
order 16 bits of rD. The remaining bits in rD are filled with a copy of the most-significant 


bit of the loaded half word. 
EA is placed into rA. 


If rA =0 orrA = rbD, the instruction form is invalid. 


Other registers altered: 


e None 


PowerPC Architecture Level 


Supervisor Level 


Optional 


Form 





UISA 














D 








Chapter 8. Instruction Set 


8-113 


Ihaux Ihaux 


Load Half Word Algebraic with Update Indexed 


Ihaux rD,rA,rB 

[_] Reserved 
ee 378 jo 
0 5 6 10 11 15 16 20 21 30 31 


EA < (YA) + (xB) 
rD ¢ EXTS(MEM(EA, 2)) 
ra ¢ EA 


EA is the sum (rA) + (rB). The half word in memory addressed by EA is loaded into the 
low-order 16 bits of rD. The remaining bits in rD are filled with a copy of the most- 
significant bit of the loaded half word. 


EA is placed into rA. 


If rA =0 orrA =rbD, the instruction form is invalid. 
Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA X 
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Ihax Ihax 


Load Half Word Algebraic Indexed 








Ihax rD,rA,rB 
[_] Reserved 
31 D A B 343 0 
0 5 6 10 11 15 16 20 21 30 31 


if rA = 0 then b ¢ 0 
else b < (rA) 

EA <b + (xB) 

rD ¢ EXTS (MEM(EA, 2) ) 


EA is the sum (rAl0) + (rB). The half word in memory addressed by EA is loaded into the 
low-order 16 bits of rD. The remaining bits in rD are filled with a copy of the most- 
significant bit of the loaded half word. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 











UISA X 














Chapter 8. Instruction Set 8-115 


Ihbrx Ihbrx 


Load Half Word Byte-Reverse Indexed 








Ihbrx rD,rA,rB 
[_] Reserved 
31 D A B 790 0 
0 5 6 10 11 15 16 20 21 30 31 
if rA = 0 then b ¢ 0 
else b € (rA) 
EA © b + (xB) 
xD < (16)0 || MEM(EA + 1, 1) || MEM(EA, 1) 


EA is the sum (rAl0) + (rB). Bits 0-7 of the half word in memory addressed by EA are 
loaded into the low-order eight bits of rD. Bits 8-15 of the half word in memory addressed 
by EA are loaded into the subsequent low-order eight bits of rD. The remaining bits in rD 
are cleared. 


The PowerPC architecture cautions programmers that some implementations of the 
architecture may run the lhbrx instructions with greater latency than other types of load 
instructions. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 
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Ihz Ihz 


Load Half Word and Zero 
lhz rD,d(rA) 





0 5 6 10 11 15 16 31 


if rA = 0 then b¢ 0 
else b € (xA) 
EA€b + EXTS(d) 
xD < (16)0 || MEM(EA, 2) 
EA is the sum (rAl0) + d. The half word in memory addressed by EA is loaded into the low- 


order 16 bits of rD. The remaining bits in rD are cleared. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 








UISA D 

















Chapter 8. Instruction Set 8-117 


Ihzu Ihzu 


Load Half Word and Zero with Update 


Ihzu rD,d(rA) 





0 5 6 10 11 15 16 31 
EA < xrA + EXTS(d) 


rD< (16)0 || MEM(EA, 2) 
rA< EA 


EA is the sum (rA) + d. The half word in memory addressed by EA is loaded into the low- 
order 16 bits of rD. The remaining bits in rD are cleared. 


EA is placed into rA. 


If rA =0orrA =rbD, the instruction form is invalid. 
Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA D 
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Ihzux Ihzux 


Load Half Word and Zero with Update Indexed 

















Ihzux rD,rA,rB 
[_] Reserved 
31 D A B 311 0 
0 5 6 10 11 15 16 20 21 30 31 
EA < (rA) + (xB) 
rD¢ (16)0 || MEM(EA, 2) 
rAc EA 


EA is the sum (rA) + (rB). The half word in memory addressed by EA is loaded into the 
low-order 16 bits of rD. The remaining bits in rD are cleared. 


EA is placed into rA. 


If rA =O orrA =rbD, the instruction form is invalid. 
Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA X 























Chapter 8. Instruction Set 8-119 


Ihzx Ihzx 


Load Half Word and Zero Indexed 





Ihzx rD,rA,rB 
[_] Reserved 

at = es ee 279 jo 
0 5 6 10 11 15 16 20 21 30 31 

if rA = 0 then b¢ 0 

else b € (xA) 

EFA€b + (xB) 

xD < (16)0 || MEM(EA, 2) 











EA is the sum (rAl0) + (rB). The half word in memory addressed by EA is loaded into the 
low-order 16 bits of rD. The remaining bits in rD are cleared. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 
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Imw Imw 


Load Multiple Word 


Imw rD,d(rA) 
[POWER mnemonic: Im] 


ae ae a a 
5 


0 6 10 11 15 16 31 


if rA = 0 then b¢ 0 


else b € (xrA) 
EA <b + EXTS(d) 
r€ xD 


do while r $ 31 
GPR(r) < MEM(EBA, 4) 


reritl 

EAC EA + 4 
EA is the sum (rAl0) + d. 
n= (32-rD). 


n consecutive words starting at EA are loaded into GPRs rD through r31. 


EA must be a multiple of four. If it is not, either the system alignment exception handler is 
invoked or the results are boundedly undefined. For additional information about alignment 
and DSI exceptions, see Section 6.4.3, “DSI Exception (0x00300).” 


If rA is in the range of registers specified to be loaded, including the case in which rA = 0, 
the instruction form is invalid. 


Note that, in some implementations, this instruction is likely to have a greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual load or store 
instructions that produce the same results. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















Chapter 8. Instruction Set 8-121 


Iswi Iswi 


Load String Word Immediate 


Iswi rD,rA,NB 
[POWER mnemonic: Isi] 


[_] Reserved 





31 D A NB 597 0 
0 5 6 10 11 15 16 20 21 30 31 


if rA = 0 then EFA¢ 0 
else EA¢ (rA) 
if NB = 0 then ne 32 
elsen< NB 
re, 2D 
i  () 
do while n> 0 
if i = 32 then 
r€& r+ 1 (mod 32) 
GPR(r) & 0 
GPR(r) [i-i + 7] < MEM(EA, 1) 
ifits 
if i = 32 then i<¢ 0 
FA ¢ EA +1 
nen-l 


EA is (rA 10). 








Let n = NB if NB 40, n = 32 if NB = 0; 7 is the number of bytes to load. 
Let nr = CEIL(n + 4); nr is the number of registers to be loaded with data. 


n consecutive bytes starting at EA are loaded into GPRs rD through rD + nr — 1. 


Bytes are loaded left to right in each register. The sequence of registers wraps around to r0 
if required. If the 4 bytes of register rD + nr — 1 are only partially filled, the unfilled low- 
order byte(s) of that register are cleared. 


If rA is in the range of registers specified to be loaded, including the case in which rA = 0, 
the instruction form is invalid. 


Under certain conditions (for example, segment boundary crossing) the data alignment 
exception handler may be invoked. For additional information about data alignment 
exceptions, see Section 6.4.3, “DSI Exception (0x00300).” 
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Note that, in some implementations, this instruction is likely to have greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual load or store 
instructions that produce the same results. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA X 























Chapter 8. Instruction Set 8-123 


IswXx Iswx 
Load String Word Indexed 


Iswx rD,rA,rB 


[POWER mnemonic: Isx] 





[_] Reserved 


31 D A B 533 0 
0 5 6 10 11 15 16 20 21 30 31 





if rA = 0 then b¢ 0 
else b<€ (rA) 
EA b + (xB) 
ne XER[25-31] 
r€ew-1 
ie 32 
rD ¢< undefined 
do while n> 0 
if i = 32 then 
r€& xr+1 (mod 32) 
GPR(r) & 0 
GPR(r) [i-i + 7] © MEM(EA, 1) 
ieits8 
if i = 32 then i¢ 0 
FAC FA+ 1 
nen-t1 
EA is the sum (rAI0) + (rB). Let n = XER[25-31]; n is the number of bytes to load. Let 
nr = CEIL(n + 4); nr is the number of registers to receive data. If n > 0, n consecutive bytes 


starting at EA are loaded into GPRs rD through rD + nr - 1. 


Bytes are loaded left to right in each register. The sequence of registers wraps around 
through r0 if required. If the four bytes of rD + nr — | are only partially filled, the unfilled 
low-order byte(s) of that register are cleared. If n = 0, the contents of rD are undefined. 


If rA or rB is in the range of registers specified to be loaded, including the case in which 
rA = 0, either the system illegal instruction error handler is invoked or the results are 
boundedly undefined. 


If rD = rA or rD = FB, the instruction form is invalid. 


If rD and rA both specify GPRO, the form is invalid. 
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Under certain conditions (for example, segment boundary crossing) the data alignment 
exception handler may be invoked. For additional information about data alignment 
exceptions, see Section 6.4.3, “DSI Exception (0x00300).” 


Note that, in some implementations, this instruction is likely to have a greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual load or store 
instructions that produce the same results. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA X 























Chapter 8. Instruction Set 8-125 


Ilwarx Ilwarx 


Load Word and Reserve Indexed 








lwarx rD,rA,rB 
[_] Reserved 
31 D A B 20 0 
0 5 6 10 11 15 16 20 21 30 31 


if rA = 0 then b¢ 0 

else b€ (rA) 

EA b + (xB) 

RESERVE <& 1 

RESERVE_ADDR < physical_addr (EA) 
rD < MEM(EA, 4) 


EA is the sum (rAl0) + (rB). 
The word in memory addressed by EA is loaded into rD. 


This instruction creates a reservation for use by a store word conditional indexed 
(stwex.)instruction. The physical address computed from EA is associated with the 
reservation, and replaces any address previously associated with the reservation. 


EA must be a multiple of four. If it is not, either the system alignment exception handler is 
invoked or the results are boundedly undefined. For additional information about alignment 
and DSI exceptions, see Section 6.4.3, “DSI Exception (0x00300).” 


When the RESERVE bit is set, the processor enables hardware snooping for the block of 
memory addressed by the RESERVE address. If the processor detects that another 
processor writes to the block of memory it has reserved, it clears the RESERVE bit. The 
stwex. instruction will only do a store if the RESERVE bit is set. The stwex. instruction sets 
the CRO[EQ] bit if the store was successful and clears it if it failed. The lwarx and stwex. 
combination can be used for atomic read-modify-write sequences. Note that the atomic 
sequence is not guaranteed, but its failure can be detected if CRO[EQ] = 0 after the stwex. 
instruction. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 
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Iwbrx Iwbrx 


Load Word Byte-Reverse Indexed 


Iwbrx rD,rA,rB 
[POWER mnemonic: Ibrx] 








[_] Reserved 
31 D A B 534 0 
0 5 6 10 11 15 16 20 21 30 31 
if rA = 0 then b< 0 
else b€ (rxrA) 
EA b + (xB) 
rD< MEM(EA + 3, 1) || MEM(EA + 2, 1) || MEM(EA + 1, 1) || MEM(EA, 1) 


EA is the sum (rAI0) + rB. Bits 0-7 of the word in memory addressed by EA are loaded 
into the low-order 8 bits of rD. Bits 8-15 of the word in memory addressed by EA are 
loaded into the subsequent low-order 8 bits of rD. Bits 16-23 of the word in memory 
addressed by EA are loaded into the subsequent low-order eight bits of rD. Bits 24-31 of 
the word in memory addressed by EA are loaded into the subsequent low-order 8 bits of 
rD. 


The PowerPC architecture cautions programmers that some implementations of the 
architecture may run the Iwbrx instructions with greater latency than other types of load 
instructions. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















Chapter 8. Instruction Set 8-127 


Iwz Iwz 


Load Word and Zero 


lwz rD,d(rA) 
[POWER mnemonic: I] 





0 5 6 10 11 15 16 31 


if rA = 0 then b¢ 0 
else b€ (rA) 
EA © b + EXTS(d) 

xD < MEM(EA, 4) 


EA is the sum (rAIO) + d. The word in memory addressed by EA is loaded into rD. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA D 
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Iwzu Iwzu 


Load Word and Zero with Update 


Iwzu rD,d(rA) 

[POWER mnemonic: lu] 

tee ee es I ee I 

0 5 6 10 11 15 16 31 
EA ¢ rA + EXTS(d) 


xD © MEM(EA, 4) 
TAC EA 


EA is the sum (rA) + d. The word in memory addressed by EA is loaded into rD. 
EA is placed into rA. 


If rA =0, or rA = rD, the instruction form is invalid. 
Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















Chapter 8. Instruction Set 8-129 


Iwzux Iwzux 


Load Word and Zero with Update Indexed 


lwzux rD,rA,rB 
[POWER mnemonic: lux] 


[_] Reserved 





31 D A B 55 0 
0 5 6 10 11 15 16 20 21 30 31 
EA < (rA) + (xB) 


xD < MEM(EA, 4) 
rAc EA 


EA is the sum (rA) + (rB). The word in memory addressed by EA is loaded into rD. 


EA is placed into rA. 


If rA =0, or rA = rD, the instruction form is invalid. 
Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 














UISA X 
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lwzx lwzx 


Load Word and Zero Indexed 


lwzx rD,rA,rB 


[POWER mnemonic: Ix] 


[_] Reserved 
ee ee ee | 
0 5 6 10 11 15 16 20 21 30 31 


if rA = 0 then b¢ 0 
else b<€ (rA) 
EA b+ xB 

rD < MEM(EA, 4) 


EA is the sum (rAl0) + (rB). The word in memory addressed by EA is loaded into rD. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















Chapter 8. Instruction Set 8-131 


merf merf 


Move Condition Register Field 


merf erfD,crfS 
[_] Reserved 
a 
89 10 11 1314 15 16 20 21 30 31 


CR[4 * erfD-4 * crfD + 3] < CR[4* crfS-4 * erfS + 3] 
The contents of condition register field erfS are copied into condition register field erfD. 
All other condition register fields remain unchanged. 
Other registers altered: 
¢ Condition Register (CR field specified by operand erfD): 


Affected: LT, GT, EQ, SO 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA XL 
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merfs merfs 


Move to Condition Register from FPSCR 








merfs erfD,crfS 
[_] Reserved 
63 crfD 00 crfS 00 00000 64 0 
0 5 6 89 10 11 1314 15 16 20 21 30 31 


The contents of FPSCR field erfS are copied to CR field erfD. All exception bits copied 
(except FEX and VX) are cleared in the FPSCR. 
Other registers altered: 
¢ Condition Register (CR field specified by operand erfD): 
Affected: FX, FEX, VX, OX 
¢ Floating-Point Status and Control Register: 


Affected: FX, OX (if erfS = 0) 
Affected: UX, ZX, XX, VXSNAN (if erfS = 1) 
Affected: VXISI, VXIDI, VXZDZ, VXIMZ (if erfS = 2) 
Affected: VXVC (if erfS = 3) 


Affected: VXSOFT, VXSQRT, VXCVI (if erfS = 5) 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















Chapter 8. Instruction Set 8-133 


mcrxr mcrxr 


Move to Condition Register from XER 


mcrxr crfD 
[_] Reserved 
a 
8 9 10 11 15 16 20 21 30 31 


CR[4 * crfD-4 * crfD + 3] < XER[0-3] 
XER[0-3] < 060000 
The contents of XER[0-3] are copied into the condition register field designated by erfD. 
All other fields of the condition register remain unchanged. XER[0-3] is cleared. 
Other registers altered: 
¢ Condition Register (CR field specified by operand erfD): 
Affected: LT, GT, EQ, SO 


* XER[0-3] 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 
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mfcr mfcr 


Move from Condition Register 





mfcr rD 
[_] Reserved 
31 D 0 0000 00000 19 0 
0 56 10 11 15 16 20 21 30 31 
rD¢ CR 


The contents of the condition register (CR) are placed into rD. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















Chapter 8. Instruction Set 8-135 


mffsx mffsx 


Move from FPSCR 

















inffs frD (Rc = 0) 
inffs. frD (Re = 1) 
Reserved 
63 D 00000 00000 583 Re 
0 56 10 11 15 16 20 21 30 31 


fxD [32-63] <-FPSCR 
The contents of the floating-point status and control register (FPSCR) are placed into the 
low-order bits of register frD. The high-order bits of register frD are undefined. 
Other registers altered: 


* Condition Register (CR1 field): 





Affected: FX, FEX, VX, OX Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 
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mfmsr mfmsr 


Move from Machine State Register 








mfmsr rD 
[_] Reserved 
31 D 00000 00000 83 0 
0 56 10 11 15 16 20 21 30 31 
rm < MSR 


The contents of the MSR are placed into rD. 
This is a supervisor-level instruction. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


OEA V X 























Chapter 8. Instruction Set 8-137 


mfspr mfspr 


Move from Special-Purpose Register 








mfspr rD,SPR 
[_] Reserved 
31 D spr* 339 0 
0 56 10 11 20 21 30 31 


*Note: This is a split field. 


n© spr[5-9] || spr[0-4] 
rD ¢ SPR(n) 


In the PowerPC UISA, the SPR field denotes a special-purpose register, encoded as shown 
in Table 8-9. The contents of the designated special-purpose register are placed into rD. 


Table 8-9. PowerPC UISA SPR Encodings for mfspr 


SPR** 


** Note that the order of the two 5-bit halves of the SPR 
number is reversed compared with the actual instruction 
coding. 





If the SPR field contains any value other than one of the values shown in Table 8-9 (and the 
processor is in user mode), one of the following occurs: 


¢ The system illegal instruction error handler is invoked. 
¢ The system supervisor-level instruction error handler is invoked. 
¢ The results are boundedly undefined. 


Other registers altered: 


e None 
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Simplified mnemonics: 


mfxer rD equivalent to mfspr rD,1 
mfilr rD equivalent to mfspr rD,8 
mfctr rD equivalent to mfspr rD,9 


In the PowerPC OEA, the SPR field denotes a special-purpose register, encoded as shown 
in Table 8-10. The contents of the designated SPR are placed into rD. 


SPR[O] = 1 if and only if reading the register is supervisor-level. Execution of this 
instruction specifying a defined and supervisor-level register when MSR[PR] = | will result 
in a privileged instruction type program exception. 


If MSR[PR] = 1, the only effect of executing an instruction with an SPR number that is not 
shown in Table 8-10 and has SPR[0] = 1 is to cause a supervisor-level instruction type 
program exception or an illegal instruction type program exception. For all other cases, 
MSR[PR] = 0 or SPR[0] = 0. If the SPR field contains any value that is not shown in 
Table 8-10, either an illegal instruction type program exception occurs or the results are 
boundedly undefined. 


Other registers altered: 


e None 


ae 8-10. PowerPC OEA SPR Encodings for mfspr 


Register 


a 
a 
En 
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Table 8-10. PowerPC OEA SPR Encodings for mfspr (Continued) 


PPR Register 
a | sprio—a) Name 


10000 Supervisor 
10000 Supervisor 
[sai [tooo [rior [oaarat [Spano 
[ee [sant [Sr 
a 


‘Note that the order of the two 5-bit halves of the SPR number is reversed 
compared with actual instruction coding. 











For mtspr and mfspr instructions, the SPR number coded in assembly 
language does not appear directly as a 10-bit binary number in the 
instruction. The number coded is split into two 5-bit halves that are 
reversed in the instruction, with the high-order five bits appearing in bits 
16-20 of the instruction and the low-order five bits in bits 11-15. 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA/OEA v XFX 























* Note that mfspr is supervisor-level only if SPR[O] = 1. 
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mfsr mfsr 


Move from Segment Register 














mfsr rD,SR 
[_] Reserved 
31 D lol SR 00000 595 lo | 
0 56 101112 15 16 20 21 30 31 


xD < SEGREG (SR) 
The contents of segment register SR are placed into rD. 


This is a supervisor-level instruction. 


This instruction is defined only for 32-bit implementations; using it on a 64-bit 
implementation causes an illegal instruction type program exception. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





OEA V X 




















Chapter 8. Instruction Set 8-141 


mfsrin mfsrin 


Move from Segment Register Indirect 





mfsrin rD,rB 
[_] Reserved 
31 D 00000 B 659 0 
0 56 10 11 15 16 20 21 30 31 


rD < SEGREG (xB [0-3] ) 
The contents of the segment register selected by bits 0-3 of rB are copied into rD. 


This is a supervisor-level instruction. 


This instruction is defined only for 32-bit implementations. Using it on a 64-bit 
implementation causes an illegal instruction type program exception. 


Note that the rA field is not defined for the mfsrin instruction in the PowerPC architecture. 
However, mfsrin performs the same function in the PowerPC architecture as does the mfsri 
instruction in the POWER architecture (if rA = 0). 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


OEA V X 
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mftb mftb 


Move from Time Base 





mftb rD,TBR 
[_] Reserved 
31 D tbr* 371 0 
0 5 6 10 11 20 21 30 31 


*Note: This is a split field. 


nos tbre[S=9]. | > tee o=4] 
if n= 268 then 

rD <¢ TBL 
else if n = 269 then 

rD < TBU 


The contents of TBL or TBU are copied into rD, as designated by the value in TBR, 
encoded as shown in Table 8-11. 


Table 8-11. TBR Encodings for mftb 


ee 
Register 


= = the order of the two 5-bit halves of the TBR number is 





If the TBR field contains any value other than one of the values shown in Table 8-11, then 
one of the following occurs: 


¢ The system illegal instruction error handler is invoked. 
¢ The system supervisor-level instruction error handler is invoked. 
¢ The results are boundedly undefined. 


It is important to note that some implementations may implement mftb and mfspr 
identically, therefore, a TBR number must not match an SPR number. 


For more information on the time base refer to Section 2.2, “PowerPC VEA Register 
Set—Time Base.” 
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Other registers altered: 


e None 


Simplified mnemonics: 





mftb rD equivalent to mftb rD,268 
mftbu rD equivalent to mftb rD,269 
PowerPC Architecture Level Supervisor Level Optional Form 
VEA XFX 
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mtcrf micrf 


Move to Condition Register Fields 





mtcrf CRM,rS 
[_] Reserved 
31 Ss 0 CRM 0 144 0 
0 5 6 10 1112 19 20 21 30 31 
mask <— (4) (CRM[O]) || (4) (CRM[1]) ||... (4) (CRM[7]) 
CR (rS & mask) | (CR & 7 mask) 


The contents of rS are placed into the condition register under control of the field mask 
specified by CRM. The field mask identifies the 4-bit fields affected. Let i be an integer in 
the range 0-7. If CRM(i) = 1, CR field i (CR bits 4 * i through 4 * i + 3) is set to the contents 
of the corresponding field of rS. 


Note that updating a subset of the eight fields of the condition register may have 
substantially poorer performance on some implementations than updating all of the fields. 


Other registers altered: 
* CR fields selected by mask 


Simplified mnemonics: 





mtcr rS equivalent to mtcrf OxFF,rS 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XFX 
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mtfsb0 x mtfsb0 x 


Move to FPSCR Bit 0 

















mtfsb0 crbD (Rc = 0) 
mtfsb0. crbD (Re = 1) 
Reserved 
63 crbD 00000 00000 70 Re 
0 56 10 11 15 16 20 21 30 31 


Bit crbD of the FPSCR is cleared. 


Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX Gf Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPSCR bit erbD 
Note: Bits 1 and 2 (FEX and VX) cannot be explicitly cleared. 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 
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mtfsb1 x 


Move to FPSCR Bit 1 


mtfsb1 x 









































mtfsb1 crbD (Re = 0) 
mtfsb1. crbD (Rc = 1) 
[_] Reserved 
63 | crbD | 00000 00000 38 Ro| 
0 56 10 11 15 16 20 21 30 31 
Bit crbD of the FPSCR is set. 
Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX (if Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPSCR bit erbD and FX 
Note: Bits 1 and 2 (FEX and VX) cannot be explicitly set. 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 
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mtfstfx mtfstfx 


Move to FPSCR Fields 

















mtfsf FM,frB (Rc = 0) 
mtfsf. FM,frB (Re = 1) 
Reserved 
63 0 FM 0 B 711 Re 
0 5 6 7 1415 16 20 21 30 31 


The low-order 32 bits of frB are placed into the FPSCR under control of the field mask 
specified by FM. The field mask identifies the 4-bit fields affected. Let i be an integer in the 
range 0-7. If FM[i] = 1, FPSCR field i (FPSCR bits 4 * i through 4 * i + 3) is set to the 
contents of the corresponding field of the low-order 32 bits of register frB. 


FPSCR[FX] is altered only if FM[0] = 1. 


Updating fewer than all eight fields of the FPSCR may have substantially poorer 
performance on some implementations than updating all the fields. 


When FPSCR[0-3] is specified, bits 0 (FX) and 3 (OX) are set to the values of frB[32] and 
frB[35] (that is, even if this instruction causes OX to change from 0 to 1, FX is set from 
frB[32] and not by the usual rule that FX is set when an exception bit changes from 0 to 1). 
Bits 1 and 2 (FEX and VX) are set according to the usual rule and not from frB[33-—34]. 
Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX (if Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPSCR fields selected by mask 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA XFL 
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mtfsfix mtfsfix 


Move to FPSCR Field Immediate 





mntfsfi crfD,IMM (Rc = 0) 
mtfsfi. crfD,IMM (Re = 1) 
[_] Reserved 
63 crfD 00 00000 IMM 0 134 Re 
0 5 6 8 9 10 1112 15 16 19 20 21 30 31 


FPSCR[er£D] < IMM 
The value of the IMM field is placed into FPSCR field erfD. 


FPSCR[FX] is altered only if erfD = 0. 


When FPSCR[0-3] is specified, bits 0 (FX) and 3 (OX) are set to the values of IMM[0] and 
IMM{[3] (that is, even if this instruction causes OX to change from 0 to 1, FX is set from 
IMM[O] and not by the usual rule that FX is set when an exception bit changes from 0 to 
1). Bits 1 and 2 (FEX and VX) are set according to the usual rule and not from IMM[1-2]. 
Other registers altered: 
* Condition Register (CR1 field): 
Affected: FX, FEX, VX, OX (if Re = 1) 
¢ Floating-Point Status and Control Register: 
Affected: FPSCR field erfD 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 
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mtmsr mtmsr 


Move to Machine State Register 





mtmsr rs 
[_] Reserved 
31 Ss 00000 00000 146 0 
0 56 10 11 15 16 20 21 30 31 
MSR€ (xrS) 


The contents of rS are placed into the MSR. 


This is a supervisor-level instruction. It is also an execution synchronizing instruction 
except with respect to alterations to the POW and LE bits. Refer to Section 2.3.17, 
“Synchronization Requirements for Special Registers and for Lookaside Buffers,” for more 
information. 


In addition, alterations to the MSR[EE] and MSR[RI] bits are effective as soon as the 
instruction completes. Thus if MSR[EE] = 0 and an external or decrementer exception is 
pending, executing an mtmsr instruction that sets MSR[EE] = 1 will cause the external or 
decrementer exception to be taken before the next instruction is executed, if no higher 
priority exception exists. 


This instruction is defined only for 32-bit implementations. Using it on a 64-bit 
implementation causes an illegal instruction type program exception. 


Other registers altered: 
« MSR 


PowerPC Architecture Level Supervisor Level Optional Form 


OEA V X 
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mispr mispr 


Move to Special-Purpose Register 





mtspr SPR,rS 

[_] Reserved 
3 i | 
0 56 10 11 20 21 30 31 


*Note: This is a split field. 


n<€ spr[5-9] || spr[0-4] 
SPR(n) < (rS) 


In the PowerPC UISA, the SPR field denotes a special-purpose register, encoded as shown 
in Table 8-12. The contents of rS are placed into the designated special-purpose register. - 


Table 8-12. PowerPC UISA SPR Encodings for mtspr 


SPR** 


Register Name 
00000 00001 


** Note that the order of the two 5-bit halves of the SPR 
number is reversed compared with actual instruction 
coding. 





If the SPR field contains any value other than one of the values shown in Table 8-12, and 
the processor is operating in user mode, one of the following occurs: 


¢ The system illegal instruction error handler is invoked. 
¢ The system supervisor instruction error handler is invoked. 
¢ The results are boundedly undefined. 

Other registers altered: 


¢ See Table 8-12. 


Simplified mnemonics: 


mtxer rD equivalent to mtspr 1,rD 
mtir rD equivalent to mtspr 8,rD 
mtctr rD equivalent to mtspr 9,rD 
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In the PowerPC OEA, the SPR field denotes a special-purpose register, encoded as shown 
in Table 8-13. The contents of rS are placed into the designated special-purpose register. - 


For this instruction, SPRs TBL and TBU are treated as separate 32-bit registers; setting one 
leaves the other unaltered. 


The value of SPR[O0] = 1 if and only if writing the register is a supervisor-level operation. 
Execution of this instruction specifying a defined and supervisor-level register when 
MSR[PR] = 1 results in a privileged instruction type program exception. 


If MSR[PR] = | then the only effect of executing an instruction with an SPR number that 
is not shown in Table 8-13 and has SPR[0] = 1 is to cause a privileged instruction type 
program exception or an illegal instruction type program exception. For all other cases, 
MSR[PR] = 0 or SPR[0] = 0, if the SPR field contains any value that is not shown in 
Table 8-13, either an illegal instruction type program exception occurs or the results are 
boundedly undefined. 


Other registers altered: 
¢ See Table 8-13. 


Table 8-13. PowerPC OEA SPR Encodings for mtspr 


. 


ED 
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Table 8-13. PowerPC OEA SPR Encodings for mtspr (Continued) 


} 


: 





‘Note that the order of the two 5-bit halves of the SPR number is reversed. For mtspr 
and mfspr instructions, the SPR number coded in assembly language does not appear 
directly as a 10-bit binary number in the instruction. The number coded is split into two 
5-bit halves that are reversed in the instruction, with the high-order five bits appearing 
in bits 16-20 of the instruction and the low-order five bits in bits 11-15. 





PowerPC Architecture Level Supervisor Level Optional Form 





UISA/OEA v XFX 




















* Note that mtspr is supervisor-level only if SPR[O] = 1. 
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mtsr mtsr 


Move to Segment Register 





mtsr SR,rS 
[_] Reserved 
31 Ss 0 SR 00000 210 0 
0 56 10 1112 15 16 20 21 30 31 


SEGREG(SR) < (x8) 
The contents of rS are placed into SR. 


This is a supervisor-level instruction. 


This instruction is defined only for 32-bit implementations. Using it on a 64-bit 
implementation causes an illegal instruction type program exception. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level — Optional Form 





OEA V X 
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mtsrin mtsrin 


Move to Segment Register Indirect 


mtsrin rs,rB 


[POWER mnemonic: mtsri] 


[_] Reserved 





31 Ss 00000 B 242 lo | 
0 5 6 10 11 15 16 20 21 30 31 




















SEGREG (xB[0-3]) < (xS) 
The contents of rS are copied to the segment register selected by bits 0-3 of rB. 


This is a supervisor-level instruction. 


This instruction is defined only for 32-bit implementations. Using it on a 64-bit 
implementation causes an illegal instruction type program exception. 


Note that the PowerPC architecture does not define the rA field for the mtsrin instruction. 
However, mtsrin performs the same function in the PowerPC architecture as does the mtsri 
instruction in the POWER architecture (if rA = 0). 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





OEA V X 
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mulhwx mulhwx 
Multiply High Word 





mulhw rD,rA,rB (Rc = 0) 
mulhw. rD,rA,rB (Re = 1) 
[_] Reserved 
31 D A B 75 Re 
0 5 6 10 11 15 16 20 21 22 30 31 


prod[0-63] + rA* xB 
rD ¢ prod 


The 32-bit product is formed from the contents of rA and rB. The high-order 32 bits of the 
64-bit product of the operands are placed into rD. 


Both the operands and the product are interpreted as signed integers. 


This instruction may execute faster on some implementations if rB contains the operand 
having the smaller absolute value. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO Gf Re = 1) 


Note: The setting of CRO bits LT, GT, and EQ is mode-dependent, and reflects 
overflow of the 32-bit result. 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA XO 
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mulhwux mulhwux 
Multiply High Word Unsigned 

















mulhwu rD,rA,rB (Rc = 0) 
mulhwu. rD,rA,rB (Re = 1) 
[_] Reserved 
31 D A B | 0 | 11 Re 
0 5 6 10 11 15 16 20 21 22 30 31 


prod[0-63] < rA* xB 
rD ¢ prod[0-31] 


The 32-bit operands are the contents of rA and rB. The high-order 32 bits of the 64-bit 
product of the operands are placed into rD. 


Both the operands and the product are interpreted as unsigned integers, except that if 
Rc = | the first three bits of CRO field are set by signed comparison of the result to zero. 


This instruction may execute faster on some implementations if rB contains the operand 
having the smaller absolute value. 


Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO Gf Re = 1) 


Note: The setting of CRO bits LT, GT, and EQ is mode-dependent, and reflects 
overflow of the 32-bit result. 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA XO 
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mulli mulli 


Multiply Low Immediate 


mulli rD,rA,SIMM 
[POWER mnemonic: muli] 


07 D A SIMM 
0 5 6 10 11 15 16 31 





prod[0-48] < (rA) * SIMM 
rD ¢ prod[16-48] 


The 32-bit first operand is (rA). The 16-bit second operand is the value of the SIMM field. 
The low-order 32-bits of the 48-bit product of the operands are placed into rD. 


Both the operands and the product are interpreted as signed integers. The low-order 32 bits 
of the product are calculated independently of whether the operands are treated as signed 
or unsigned 32-bit integers. 


This instruction can be used with mulhdx or mulhwx to calculate a full 64-bit product. 


The low-order 32 bits of the product are the correct 32-bit product for 32-bit 
implementations. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA D 
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mullwx mullwx 


Multiply Low Word 


mullw rDrArB = (OE=0 Rc =0) 
mullw. rD,rA,rB_ Ss (OE=0 Re= 1) 
mullwo rD,rA,rB_ ss (OE=1 Rc =0) 
mullwo. rD,rA,rB_— (OE=1 Re=1) 


[POWER mnemonics: muls, muls., mulso, mulso.] 





31 D A B OE 235 Re 
0 5 6 10 11 15 16 20 21 22 30 31 


rD¢ rA* rB 
The 32-bit operands are the contents of rA and rB. The low-order 32 bits of the 64-bit 
product (rA) * (rB) are placed into rD. 


The low-order 32 bits of the product are the correct 32-bit product for 32-bit 
implementations. The low-order 32-bits of the product are independent of whether the 
operands are regarded as signed or unsigned 32-bit integers. 


If OE = 1, then OV is set if the product cannot be represented in 32 bits. Both the operands 
and the product are interpreted as signed integers. 


Note that this instruction may execute faster on some implementations if rB contains the 
operand having the smaller absolute value. 


Other registers altered: 
* Condition Register (CRO field): 


Affected: LT, GT, EQ, SO Gf Re = 1) 
Note: CRO field may not reflect the infinitely precise result if overflow occurs (see 
XER below). 
¢ XER: 
Affected: SO, OV Gf OE = 1) 


Note: The setting of the affected bits in the XER is mode-independent, and reflects 
overflow of the 32-bit result. 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA XO 
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nandx nandx 

















NAND 
nand rA,rs,rB (Rc = 0) 
nand. rA,rs,rB (Re = 1) 
31 Ss A B 476 Re 
0 5 6 10 11 15 16 20 21 30 31 


rAe 7 ((xS) & (xB)) 
The contents of rS are ANDed with the contents of rB and the complemented result is 
placed into rA. 


nand with rS = rB can be used to obtain the one's complement. 


Other registers altered: 
* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 
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negx negx 








Negate 

neg rD,rA (OE =0 Rc = 0) 
neg. rD,rA (OE=0 Rc= 1) 
nego rD,rA (OE= 1 Rc =0) 
nego. rD,rA (OE=1Rc=1) 

[_] Reserved 

31 ; » [0000 od 104 Re 

0 5 6 10 11 15 16 20 21 22 30 31 


rD¢ 7 (rA) +1 
The value 1 is added to the complement of the value in rA, and the resulting two’s 
complement is placed into rD. 


If rA contains the most negative 32-bit number (Ox8000_0000), the result is the most 
negative number and, if OE = 1, OV is set. 


Other registers altered: 
* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
e XER: 
Affected: SO OV Gf OE = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XO 
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norx norx 


NOR 
nor rA,rs,rB (Rc = 0) 
nor. rA,rs,rB (Re = 1) 

0 5 6 10 11 15 16 20 21 30 31 


rA< 7 ((xS) | (xB)) 
The contents of rS are ORed with the contents of rB and the complemented result is placed 
into rA. 


nor with rS = rB can be used to obtain the one’s complement. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO Gf Re = 1) 


Simplified mnemonics: 





not rD,rS equivalent to nor rA,rS,rS 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 
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Orx Orx 





OR 
or rA,rs,rB (Rc = 0) 
or. rA,rS,rB (Re = 1) 
31 s A B 444 Re 
0 5 6 10 11 15 16 20 21 30 31 
rA< (rS) | (xB) 


The contents of rS are ORed with the contents of rB and the result is placed into rA. 


The simplified mnemonic mr (shown below) demonstrates the use of the or instruction to 
move register contents. 


Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO Gf Re = 1) 


Simplified mnemonics: 





mr rA,rS equivalent to or rA,rs,rs 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 
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orcex orcex 


OR with Complement 





orc rA,rS,rB (Rc=0) 
orc. rA,rS,rB (Re = 1) 
31 Ss A B 412 Re 
0 5 6 10 11 15 16 20 21 30 31 
rA& (rS) | 7 (xB) 


The contents of rS are ORed with the complement of the contents of rB and the result is 
placed into rA. 


Other registers altered: 
¢ Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 
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ori ori 


OR Immediate 


ori rA,rS,;UIMM 
[POWER mnemonic: oril] 





24 Ss A UIMM 
0 5 6 10 11 15 16 31 


xA& (xS) | ((16)0 || UIMM) 
The contents of rS are ORed with 0x0000 II UIMM and the result is placed into rA. 


The preferred no-op (an instruction that does nothing) is ori 0,0,0. 


Other registers altered: 


e None 


Simplified mnemonics: 





nop equivalent to ori 0,0,0 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA D 
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oris oris 


OR Immediate Shifted 


oris rA,rS,;UIMM 


[POWER mnemonic: oriu] 
0 5 6 10 11 15 16 31 


xA& (xS) | (UIMM || (16) 0) 
The contents of rS are ORed with UIMM II 0x0000 and the result is placed into rA. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 
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rfi rfi 


Return from Interrupt 





[_] Reserved 
> &l 
0 5 6 10 11 15 16 20 21 30 31 


MSR[16-23, 25-27, 30-31] <-SRR1[16-23, 25-27, 30-31] 

NIA Ciea SRRO[0-29] || 0100 
Bits SRR1[16—23, 25-27, 30-31] are placed into the corresponding bits of the MSR. If the 
new MSR value does not enable any pending exceptions, then the next instruction is 
fetched, under control of the new MSR value, from the address SRRO[0—29] || ObOO. If the 
new MSR value enables one or more pending exceptions, the exception associated with the 
highest priority pending exception is generated; in this case the value placed into SRRO by 
the exception processing mechanism is the address of the instruction that would have been 
executed next had the exception not occurred. Note that an implementation may define 
additional MSR bits, and in this case, may also cause them to be saved to SRR1 from MSR 
on an exception and restored to MSR from SRRI1 on an rfi. 


This is a supervisor-level, context synchronizing instruction. This instruction is defined 
only for 32-bit implementations. Using it on a 64-bit implementation causes an illegal 
instruction type program exception. 


Other registers altered: 
« MSR 


PowerPC Architecture Level Supervisor Level Optional Form 





OEA V XL 
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rlwimix rlwimix 


Rotate Left Word Immediate then Mask Insert 


rlwimi rA,rS,SH,MB,ME (Re = 0) 
rlwimi. rA,rS,SH,MB,ME (Re = 1) 


[POWER mnemonics: rlimi, rlimi.] 





0 5 6 10 11 15 16 20 21 25 26 30 31 


ne SH 
r€ ROTL(xS, n) 
m < MASK (MB, 
rA¢ (r & m) | (rA & 7 m) 

The contents of rS are rotated left the number of bits specified by operand SH. A mask is 

generated having 1 bits from bit MB through bit ME and 0 bits elsewhere. The rotated data 


is inserted into rA under control of the generated mask. 














Note that rlwimi can be used to insert a bit field into the contents of rA using the methods 
shown below: 


¢ To insert an n-bit field, that is left-justified rS, into rA starting at bit position b, set 
SH = 32 —b, MB =D, and 
ME=(b+n)-1. 


¢ To insert an n-bit field, that is right-justified in rS, into rA starting at bit position b, 
set SH = 32 -(b +n), MB =b, and ME= (b+ n)-1. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO Gf Re = 1) 


Simplified mnemonics: 





inslwi rA,rS,n,b equivalent to rlwimi rA,rS,32 — b,b,b+n-—-1 
insrwi rA,rS,n,b (n > 0) equivalent to rlwimi rA,rS,32 - (b + n),b,(b +n) — 1 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA M 




















8-168 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


rlwinm.x rlwinmx 


Rotate Left Word Immediate then AND with Mask 


rlwinm rA,rS,SH,MB,ME (Re = 0) 
rlwinm. rA,rS,SH,MB,ME (Re = 1) 


[POWER mnemonics: rlinm, rlinm.] 


an (RE 
5 15 16 20 21 


s 
0 6 10 11 25 26 30 31 


ne SH 
r€ ROTL(xS, n) 
m << MASK (MB, 
rAc r&m 
The contents of rS[32-63] are rotated left the number of bits specified by operand SH. A 
mask is generated having 1 bits from bit MB through bit ME and 0 bits elsewhere. The 
rotated data is ANDed with the generated mask and the result is placed into rA. 














Note that rlwinm can be used to extract, rotate, shift, and clear bit fields using the methods 
shown below: 


¢ To extract an n-bit field, that starts at bit position b in rS, right-justified into rA 
(clearing the remaining 32 —n bits of rA), set SH=b+n, 
MB = 32 —n, and ME = 31. 


¢ To extract an n-bit field, that starts at bit position b in rS, left-justified into rA 
(clearing the remaining 32 —n bits of rA), set SH = b, MB = 0, and ME=n- 1. 


¢ To rotate the contents of a register left (or right) by 1 bits, set SH = n (32 — n), 
MB = 0, and ME = 31. 


¢ To shift the contents of a register right by 7 bits, by setting SH = 32 —n, MB =n, and 
ME = 31. It can be used to clear the high-order b bits of a register and then shift the 
result left by n bits by setting SH =n, MB = b—n and ME = 31 —n. 


¢ To clear the low-order n bits of a register, by setting SH = 0, MB = 0, and 
ME = 31 —n. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO Gf Re = 1) 


Chapter 8. Instruction Set 8-169 


Simplified mnemonics: 





extlwi rA,rS,n,b (n > 0) equivalent to rlwinm rA,rS,b,0,n — 1 
extrwi rA,rS,n,b (n > 0) equivalent to rlwinm rA,rS,b + 1,32 — 71,31 
rotlwi rA,rS,n equivalent to rlwinm rA,rS,n,0,31 
rotrwi rA,rS,n equivalent to rlwinm rA,rS,32 — ,0,31 
slwi rA,rS,n (n < 32) equivalent to rlwinm rA,rS,n,0,3 1-n 
srwi rA,rS,n (n < 32) equivalent to rlwinm rA,rS,32 — 1,n,31 
clrlwi rA,rS,n (n < 32) equivalent to rlwinm rA,rS,0,,31 
clrrwi rA,rS,n (n < 32) equivalent to rlwinm rA,rS,0,0,31 —1 
clrIslwi rA,rS,b,n (n <b <32) equivalent to rlwinm rA,rS,n,b — 71,31 —n 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA M 




















8-170 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


rlwnm«x rlwnm x 


Rotate Left Word then AND with Mask 


rlwnm rA,rs,rB,MB,ME (Rc = 0) 
rlwnm. rA,rs,rB,MB,ME (Re = 1) 


[POWER mnemonics: rlnm, rInm.] 





0 5 6 10 11 15 16 20 21 25 26 30 31 


n& xB[27-31] 
r€ ROTL(xS, n) 
m << MASK (MB, 
rAc r&m 
The contents of rS are rotated left the number of bits specified by the low-order five bits of 
rB. A mask is generated having 1 bits from bit MB through bit ME and 0 bits elsewhere. 


The rotated data is ANDed with the generated mask and the result is placed into rA. 














Note that rlwnm can be used to extract and rotate bit fields using the methods shown as 
follows: 


¢ To extract an n-bit field, that starts at variable bit position b in rS, right-justified into 
rA (clearing the remaining 32 —n bits of rA), by setting the low-order five bits of 
rB to b+n, MB = 32 —n, and ME = 31. 

¢ To extract an n-bit field, that starts at variable bit position b in rS, left-justified into 
rA (clearing the remaining 32 —7n bits of rA), by setting the low-order five bits of 
rB to b, MB = 0, and ME=n-1. 

¢ To rotate the contents of a register left (or right) by n bits, by setting the low-order 
five bits of rB to n (32 —n), MB = 0, and ME=31. 


Other registers altered: 
* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO (if Re = 1) 
Simplified mnemonics: 
rotlw rA,rS,rB equivalent to rlwnmrA,rS,rB,0,31 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA M 




















Chapter 8. Instruction Set 8-171 


sc sc 


System Call 


[POWER mnemonic: svca] 


[_] Reserved 





17 00000 00000 0000 0000 0000 00 1] 0 
0 5 6 10 11 15 16 29 30 31 





In the PowerPC UISA, the se instruction calls the operating system to perform a service. 
When control is returned to the program that executed the system call, the content of the 
registers depends on the register conventions used by the program providing the system 
service. 


This instruction is context synchronizing, as described in Section 4.1.5.1, “Context 
Synchronizing Instructions.” 


Other registers altered: 


¢ Dependent on the system service 


In PowerPC OBA, the sc instruction does the following: 


SRRO <iea CIA + 4 

SRR1[1-4, 10-15] ¢« 0 

SRRI (16-23, 25-27, 30-31] <-MSR[16=23, 25-27, 30-31) 
MSR < new_value (see below) 

NIA ¢iea base_ea + 0xCO00 (see below) 


The EA of the instruction following the se instruction is placed into SRRO. Bits 16-23, 
25-27, and 30-31 of the MSR are placed into the corresponding bits of SRR1, and bits 1- 
4 and 10-15 of SRR1 are set to undefined values. Note that an implementation may define 
additional MSR bits, and in this case, may also cause them to be saved to SRR1 from MSR 
on an exception and restored to MSR from SRRI1 on an rfi. 


Then a system call exception is generated. The exception causes the MSR to be altered as 
described in Section 6.4, “Exception Definitions.” 


The exception causes the next instruction to be fetched from offset OxC00 from the physical 
base address determined by the new setting of MSR[IP]. 


Other registers altered: 





e SRRO 
e SRRI 
¢ MSR 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA/OEA sc 




















8-172 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 





slwx slwx 
Shift Left Word 
slw rA,rs,rB (Rc = 0) 
slw. rA,rs,rB (Rc= 1) 
[POWER mnemonics: sl, sl.] 
31 s A B 24 Re 
0 5 6 10 11 15 16 20 21 30 31 


ne rxrB[27-31] 
x € ROTL(xrS, n) 
if rB[58] = 0 then 


If bit 26 of rB = 0, the contents of rS are shifted left the number of bits specified by 
rB[27—31]. Bits shifted out of position 0 are lost. Zeros are supplied to the vacated positions 
on the right. The 32-bit result is placed into rA. If bit 26 of rB = 1, 32 zeros are placed into 


rA. 


Other registers altered: 


* Condition Register (CRO field): 


Affected: LT, GT, EQ, SO 


PowerPC Architecture Level 


(if Re = 1) 


Supervisor Level Optional Form 





UISA 








X 














Chapter 8. Instruction Set 


8-173 


Ssrawx srawx 
Shift Right Algebraic Word 


sraw rA,rs,rB (Rc = 0) 
sraw. rA,rs,rB (Rc= 1) 


[POWER mnemonics: sra, sra.] 





0 5 6 10 11 15 16 20 21 30 31 


ne xrB[27-31] 

x € ROTL(xrS, n) 

if rB[26] = 0 then 

m<€ MASK(n ) 

else me (32)0 

S << rs 

wrAcré&m|!S& 7m 
XER[CA] +S & (r& 7m#0 


If rB[26] = 0,then the contents of rS are shifted right the number of bits specified by 
rB[27-31]. Bits shifted out of position 31 are lost. The result is padded on the left with sign 
bits before being placed into rA. If rB[26] = 1, then rA is filled with 32 sign bits (bit 0) 
from rS. CRO is set based on the value written into rA. XER[CA] is set if rS contains a 
negative number and any 1 bits are shifted out of position 31; otherwise XER[CA] is 
cleared. A shift amount of zero causes XER[CA] to be cleared. 


Note that the sraw instruction, followed by addze, can by used to divide quickly by 2”. The 
setting of the XER[CA] bit, by sraw, is independent of mode. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO Gf Re = 1) 
¢ XER: 
Affected: CA 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















8-174 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


srawlx srawlx 


Shift Right Algebraic Word Immediate 


srawi rA,rS,SH (Rc=0) 
srawi. rA,rsS,SH (Rc=1) 


[POWER mnemonics: srai, srai.] 





31 Ss A SH 824 Re 
0 5 6 10 11 15 16 20 21 30 31 

n< SH 

r€ ROTL(rS, 32 - n) 

me MASK(n ) 

S «xs 


rA¢crém|Sé&urm 
XER[CA] <— S & ((r & 7m) # 0) 


The contents of rS are shifted right the number of bits specified by operand SH. Bits shifted 
out of position 31 are lost. The shifted value is sign-extended before being placed in rA. 
The 32-bit result is placed into rA. XER[CA] is set if rS contains a negative number and 
any | bits are shifted out of position 31; otherwise XER[CA] is cleared. A shift amount of 
zero causes XER[CA] to be cleared. 


Note that the srawi instruction, followed by addze, can be used to divide quickly by 2”. 
The setting of the CA bit, by srawi, is independent of mode. 
Other registers altered: 
¢ Condition Register (CRO field): 
Affected: LT, GT, EQ, SO Gf Re = 1) 
¢ XER: 
Affected: CA 


PowerPC Architecture Level Supervisor Level Optional Form 











UISA X 














Chapter 8. Instruction Set 8-175 


Srwx Srwx 
Shift Right Word 


srw rA,rs,rB (Rc = 0) 
srw. rA,rS,rB (Rc= 1) 


[POWER mnemonics: sr, sr.] 
0 5 6 10 11 15 16 20 21 30 31 


ne rB[27-31] 
xr ROTL(rS, 32 —n) 


The contents of rS are shifted right the number of bits specified by the low-order six bits of 
rB. Bits shifted out of position 31 are lost. Zeros are supplied to the vacated positions on 
the left. The result is placed into rA. 

Other registers altered: 


* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 




















8-176 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


stb stb 





Store Byte 
stb rS,d(rA) 
38 Ss A d 
0 56 10 11 15 16 31 


if rA = 0 then b¢ 0 
else b€ (rxrA) 

EA < b + EXTS(d) 
MEM(EA, 1) © xS[24-31] 


EA is the sum (rAl0) + d. The contents of the low-order eight bits of rS are stored into the 
byte in memory addressed by EA. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA D 























Chapter 8. Instruction Set 8-177 


stbu stbu 


Store Byte with Update 


stbu rS,d(rA) 
p= 2. TE ee Eee 
0 5 6 10 11 15 16 31 


EA<€< (rA) + EXTS (d) 
MEM(EA, 1) © xrS[24-31] 
rA< EA 


EA is the sum (rA) + d. The contents of the low-order eight bits of rS are stored into the 
byte in memory addressed by EA. 


EA is placed into rA. 
If rA = 0, the instruction form is invalid. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















8-178 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


stbux 


Store Byte with Update Indexed 


stbux 





stbux rs,rA,rB 
[_] Reserved 
31 s A B 247 
0 5 6 10 11 15 16 21 22 30 31 
EA (rA) + (xB) 
MEM(EA, 1) © rS[24-31] 
rA EA 


EA is the sum (rA) + (rB). The contents of the low-order eight bits of rS are stored into the 


byte in memory addressed by EA. 
EA is placed into rA. 
If rA = 0, the instruction form is invalid. 


Other registers altered: 


e None 


PowerPC Architecture Level 


Supervisor Level 


Optional 


Form 





UISA 














X 








Chapter 8. Instruction Set 


8-179 


stbx stbx 


Store Byte Indexed 














stbx rs,rA,rB 
[_] Reserved 
31 s A B 215 Lo | 
0 5 6 10 11 15 16 21 22 30 31 


if rA = 0 then b¢ 0 
else b€ (rA) 
EAC b + (xB) 
MEM(EA, 1) © xS[24-31] 
EA is the sum (rAl0) + (rB). The contents of the low-order eight bits of rS are stored into 


the byte in memory addressed by EA. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 
X 

















UISA 








8-180 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


sttd stid 


Store Floating-Point Double 
stfd frS,d(rA) 


54 S A d 
0 5 6 10 11 15 16 30 31 





if rA = 0 then b¢ 0 
else b€ (rxrA) 
EA < b + EXTS(d) 
MEM(EA, 8) < (£xS) 


EA is the sum (rAlQ) + d. 


The contents of register frS are stored into the double word in memory addressed by EA. 


Other registers altered: 


¢ None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















Chapter 8. Instruction Set 8-181 


stfdu stidu 


Store Floating-Point Double with Update 
stfdu frS,d(rA) 





55 s A d 
0 5 6 10 11 15 16 31 
EA (XA) + EXTS (qd) 


MEM(EA, 8) < (£xS) 
rAc EA 


EA is the sum (rA) + d. 

The contents of register frS are stored into the double word in memory addressed by EA. 
EA is placed into rA. 

If rA = 0, the instruction form is invalid. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















8-182 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


stidux stidux 


Store Floating-Point Double with Update Indexed 








stfdux frS,rA,rB 
[_] Reserved 
31 s A B 759 0 
0 5 6 10 11 15 16 20 21 30 31 


EA (rA) + (xB) 
MEM(EA, 8) < (£xS) 
rACEA 


EA is the sum (rA) + (rB). 
The contents of register frS are stored into the double word in memory addressed by EA. 
EA is placed into rA. 


If rA = 0, the instruction form is invalid. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA X 























Chapter 8. Instruction Set 8-183 


stidx stidx 


Store Floating-Point Double Indexed 


stfdx frS,rA,rB 


[_] Reserved 





B 727 


31 S A 
15 16 20 21 30 31 


0 5 6 10 11 


if rA = 0 then b ¢ 0 
else b€ (rxrA) 
EA b + (xB) 
MEM(EA, 8) < (£xS) 


EA is the sum (rAl0) + rB. 


The contents of register frS are stored into the double word in memory addressed by EA. 


Other registers altered: 


¢ None 


Optional Form 


X 


PowerPC Architecture Level Supervisor Level 








UISA 

















8-184 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


stfiwx stfiwx 


Store Floating-Point as Integer Word Indexed 


stfiwx frS,rA,rB 

[_] Reserved 
ee ee ee = el 
0 5 6 10 11 15 16 20 21 30 31 


if rA = 0 then b¢ 0 
else b€ (rxrA) 
EA b + (xB) 
MEM(EA, 4) < £rS 


EA is the sum (rAl0) + (rB). 


The contents of frS are stored, without conversion, into the word in memory addressed by 
EA. 


If the contents of register frS were produced, either directly or indirectly, by an Ifs 
instruction, a single-precision arithmetic instruction, or frsp, then the value stored is 
undefined. The contents of frS are produced directly by such an instruction if frS is the 
target register for the instruction. The contents of frS are produced indirectly by such an 
instruction if frS is the final target register of a sequence of one or more floating-point move 
instructions, with the input to the sequence having been produced directly by such an 
instruction. 


This instruction is defined as optional by the PowerPC architecture to ensure backwards 
compatibility with earlier processors; however, it will likely be required for subsequent 
PowerPC processors. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA V X 




















Chapter 8. Instruction Set 8-185 


stfs stfs 


Store Floating-Point Single 
stfs frS,d(rA) 


52 Ss A d 
0 5 6 10 11 15 16 31 





if rA = 0 then b¢ 0 
else b€ (rxrA) 

EA < b + EXTS(d) 

MEM(EA, 4) < SINGLE (f£rS) 


EA is the sum (rAlQ) + d. 


The contents of register frS are converted to single-precision and stored into the word in 
memory addressed by EA. Note that the value to be stored should be in single-precision 
format prior to the execution of the stfs instruction. For a discussion on floating-point store 
conversions, see Section D.7, “Floating-Point Store Instructions.” 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















8-186 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


stfsu stfsu 


Store Floating-Point Single with Update 


stfsu frS,d(rA) 





0 5 6 10 11 15 16 31 


EA< (rA) + EXTS (d) 
MEM(EA, 4) <— SINGLE (frS) 
rA< EA 


EA is the sum (rA) +d. 


The contents of frS are converted to single-precision and stored into the word in memory 
addressed by EA. Note that the value to be stored should be in single-precision format prior 
to the execution of the stfsu instruction. For a discussion on floating-point store 
conversions, see Section D.7, “Floating-Point Store Instructions.” 


EA is placed into rA. 
If rA = 0, the instruction form is invalid. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 








UISA D 

















Chapter 8. Instruction Set 8-187 


stfsux stfsux 





Store Floating-Point Single with Update Indexed 
stfsux frS,rA,rB 
[_] Reserved 
31 Ss A B 695 0 
0 5 6 10 11 15 16 20 21 30 31 


EA (rA) + (xB) 
MEM(EA, 4) <— SINGLE (frS) 
rA<¢ EA 


EA is the sum (rA) + (rB). 


The contents of frS are converted to single-precision and stored into the word in memory 
addressed by EA. For a discussion on floating-point store conversions, see Section D.7, 
“Floating-Point Store Instructions.” 


EA is placed into rA. 


IfrA 


= 0, the instruction form is invalid. 


Other registers altered: 


8-188 


None 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA X 























PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


stfsx 


Store Floating-Point Single Indexed 


stfsx frS,rA,rB 


stfisx 


[_] Reserved 





31 S A 
0 5 6 10 11 


if rA = 0 then b¢ 0 
else b€ (rA) 

EA b + (xB) 

MEM(EA, 4) < SINGLE (frS) 


EA is the sum (rAl0) + (rB). 


15 16 


20 21 


663 


30 31 


The contents of register frS are converted to single-precision and stored into the word in 
memory addressed by EA. For a discussion on floating-point store conversions, see 
Section D.7, “Floating-Point Store Instructions.” 


Other registers altered: 


e None 


PowerPC Architecture Level 


Supervisor Level 


Optional 


Form 





UISA 














X 








Chapter 8. Instruction Set 


8-189 


sth sth 


Store Half Word 
sth rs d(rA) 





44 Ss A d 
0 5 6 10 11 15 16 31 


if rA = 0 then b¢ 0 
else b€ (rxA) 
EA € b + EXTS(d) 
MEM(EA, 2) © xS[16-31] 
EA is the sum (rAl0) + d. The contents of the low-order 16 bits of rS are stored into the half 


word in memory addressed by EA. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















8-190 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


sthbrx 


Store Half Word Byte-Reverse Indexed 


sthbrx 





sthbrx rs,rA,rB 
[_] Reserved 
31 Ss A B 918 0 
0 5 6 10 11 15 16 20 21 30 31 
if rA = 0 then b¢ 0 
else b<¢ (rA) 
EA€ b + (xB) 
MEM(EA, 2) <— xS[24-31] || xS[16-23] 


EA is the sum (rAl0) + (rB). The contents of the low-order eight bits of rS are stored into 
bits 0-7 of the half word in memory addressed by EA. The contents of the subsequent low- 
order eight bits of rS are stored into bits 8—15 of the half word in memory addressed by EA. 


Other registers altered: 


e None 


PowerPC Architecture Level 


Supervisor Level 


Optional 


Form 





UISA 














X 








Chapter 8. Instruction Set 


8-191 


sthu sthu 


Store Half Word with Update 


sthu rs d(rA) 





45 s A d 
0 5 6 10 11 15 16 31 
EA (XA) + EXTS (qd) 


MEM(EA, 2) © rS[16-31] 
rA& EA 


EA is the sum (rA) + d. The contents of the low-order 16 bits of rS are stored into the half 
word in memory addressed by EA. 


EA is placed into rA. 
If rA = 0, the instruction form is invalid. 


Other registers altered: 


¢ None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















8-192 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


sthux sthux 


Store Half Word with Update Indexed 


sthux rs,rA,rB 


[_] Reserved 





31 Ss A B 439 
0 5 6 10 11 15 16 20 21 30 31 
EA (rA) + (xB) 


MEM(EA, 2) < xrS[16-31] 
rA< EA 


EA is the sum (rA) + (rB). The contents of the low-order 16 bits of rS are stored into the 
half word in memory addressed by EA. 


EA is placed into rA. 
If rA = 0, the instruction form is invalid. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















Chapter 8. Instruction Set 8-193 


sthx sthx 


Store Half Word Indexed 


sthx 


rs,rA,rB 


[_] Reserved 





EA is 


31 S A B 407 
5 6 10 11 15 16 20 21 30 31 


if rA = 0 then b¢ 0 

else b<€ (rA) 

EA& b + (xB) 

MEM(EA, 2) © xS[16-31] 

the sum (rAl0) + (rB). The contents of the low-order 16 bits of rS are stored into the 


half word in memory addressed by EA. 


Other 


8-194 


registers altered: 


None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


stmw stmw 


Store Multiple Word 


stmw rS,d(rA) 
[POWER mnemonic: stm] 


fe eee 
5 


Ss 
0 6 10 11 15 16 31 


if rA = 0 then b¢ 0 


else b€ (rxrA) 
EA < b + EXTS(d) 
re xs 


do while r $ 31 
MEM(EA, 4) < GPR(r) 


rere 

EAC EA + 4 
EA is the sum (rAl0) + d. 
n= (32-rS). 


n consecutive words starting at EA are stored from the GPRs rS through r31. For example, 
if rS = 30, 2 words are stored. 


EA must be a multiple of four. If it is not, either the system alignment exception handler is 
invoked or the results are boundedly undefined. For additional information about alignment 
and DSI exceptions, see Section 6.4.3, “DSI Exception (0x00300).” 


Note that, in some implementations, this instruction is likely to have a greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual load or store 
instructions that produce the same results. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















Chapter 8. Instruction Set 8-195 


stswi stswi 


Store String Word Immediate 


stswi rs,rA,NB 
[POWER mnemonic: stsi] 


[_] Reserved 





31 S A NB 725 
0 5 6 10 11 15 16 20 21 30 31 


if rA = 0 then EA¢ 0 


else EA < (rA) 

if NB = 0 then ne 32 
else n< NB 

Ce BSH 1 

ic 32 


do while n> 0 
if i = 32 then r@ r+ 1 (mod 32) 
MEM(EA, 1) < GPR(r) [i-i + 7] 
icit 8 
if i= 64 then i¢ 32 
FA EA + 1 
nen- 1 


EA is (rAl0). Let n = NB if NB #0, n = 32 if NB = 0; nis the number of bytes to store. Let 
nr = CEIL(n + 4); nr is the number of registers to supply data. 


n consecutive bytes starting at EA are stored from GPRs rS through rS + nr — 1. Bytes are 
stored left to right from each register. The sequence of registers wraps around through r0 if 
required. 


Under certain conditions (for example, segment boundary crossing) the data alignment 
exception handler may be invoked. For additional information about data alignment 
exceptions, see Section 6.4.3, “DSI Exception (0x00300).” 


Note that, in some implementations, this instruction is likely to have a greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual load or store 
instructions that produce the same results. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA X 























8-196 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


stswx stsSwx 


Store String Word Indexed 
stswx rs,rA,rB 


[POWER mnemonic: stsx] 


[_] Reserved 


31 S A B 661 0 


0 5 6 10 11 15 16 20 21 30 31 














if rA = 0 then b¢ 0 

else b€ (rxrA) 

EA b + (xB) 

n © XER[25-31] 

ES! = 1 

ic 32 

do while n> 0 
if i = 32 then r¢@ r+ 1 (mod 32) 
MEM(EA, 1) € GPR(r) [i-i + 7] 
i¢it+s8 
if i = 64 then i¢ 32 
EFA EA + 1 
nen- 1 


EA is the sum (rAl0) + (rB). Let n = XER[25-—31]; n is the number of bytes to store. Let 
nr = CEIL(n + 4); nr is the number of registers to supply data. 


n consecutive bytes starting at EA are stored from GPRs rS through rS + nr — 1. Bytes are 
stored left to right from each register. The sequence of registers wraps around through r0 if 
required. If n = 0, no bytes are stored. 


Under certain conditions (for example, segment boundary crossing) the data alignment 
exception handler may be invoked. For additional information about data alignment 
exceptions, see Section 6.4.3, “DSI Exception (0x00300).” 


Note that, in some implementations, this instruction is likely to have a greater latency and 
take longer to execute, perhaps much longer, than a sequence of individual load or store 
instructions that produce the same results. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















Chapter 8. Instruction Set 8-197 


stw stw 


Store Word 


stw rS,d(rA) 
[POWER mnemonic: st] 


36 Ss A d 
0 5 6 10 11 15 16 31 





if rA = 0 then b¢ 0 
else b€ (rA) 
EA < b + EXTS(d) 
MEM(EA, 4) < xrS 


EA is the sum (rAl0) + d. The contents of rS are stored into the word in memory addressed 
by EA. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















8-198 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


stwbrx stwbrx 


Store Word Byte-Reverse Indexed 


stwbrx rS,rA,rB 
[POWER mnemonic: stbrx] 





[_] Reserved 
31 Ss A B 662 
0 5 6 10 11 15 16 20 21 30 31 
if rA = 0 then b¢ 0 
else b€ (rA) 
EA b + (xB) 
MEM(EA, 4) <— xS[24-31] || xvS[16-23] || xrS[8-15] || xrS[0-7] 


EA is the sum (rAl0) + (rB). The contents of the low-order eight bits of rS are stored into 
bits 0-7 of the word in memory addressed by EA. The contents of the subsequent eight low- 
order bits of rS are stored into bits 8—15 of the word in memory addressed by EA. The 
contents of the subsequent eight low-order bits of rS are stored into bits 16-23 of the word 
in memory addressed by EA. The contents of the subsequent eight low-order bits of rS are 
stored into bits 24-31 of the word in memory addressed by EA. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















Chapter 8. Instruction Set 8-199 


stwcx. stwcx. 


Store Word Conditional Indexed 


stwex. rs,rA,rB 





31 S A B 150 1 
0 5 6 10 11 15 16 20 21 30 31 














if rA = 0 then b¢ 0 
else b < (ra) 
EA b + (xB) 
if RESERVE then 
if RESERVE_ADDR = physical_addr (EA) 
MEM(EA, 4) © xS 
CRO < Ob00 || Ob1 || XER[SO] 
else 
u © undefined 1-bit value 
if u then MEM(EA, 4) © xs 
CRO < 0b00 || u || XER[SO] 
RESERVE < 0 
else 
CRO < Ob00 || ObO || XER[SO] 


EA is the sum (rAl0O) + (rB). If the reserved bit is set, the stwex. instruction stores rS to 
effective address (rA + rB), clears the reserved bit, and sets CRO[EQ]. If the reserved bit 
is not set, the stwex. instruction does not do a store; it leaves the reserved bit cleared and 
clears CRO[EQ]. Software must look at CRO[EQ] to see if the stwex. was successful. 


The reserved bit is set by the lwarx instruction. The reserved bit is cleared by any stwex. 
instruction to any address, and also by snooping logic if it detects that another processor 
does any kind of store to the block indicated in the reservation buffer when reserved is set. 


If a reservation exists, and the memory address specified by the stwex. instruction is the 
same as that specified by the load and reserve instruction that established the reservation, 
the contents of rS are stored into the word in memory addressed by EA and the reservation 
is cleared. 


If a reservation exists, but the memory address specified by the stwex. instruction is not the 
same as that specified by the load and reserve instruction that established the reservation, 
the reservation is cleared, and it is undefined whether the contents of rS are stored into the 
word in memory addressed by EA. 


If no reservation exists, the instruction completes without altering memory. 


CRO field is set to reflect whether the store operation was performed as follows. 


CRO[LT GT EQ SO] =0b00 || store_performed || XER[SO] 
EA must be a multiple of four. If it is not, either the system alignment exception handler is 
invoked or the results are boundedly undefined. For additional information about alignment 
and DSI exceptions, see Section 6.4.3, “DSI Exception (0x00300).” 


8-200 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


The granularity with which reservations are managed is implementation-dependent. 
Therefore, the memory to be accessed by the load and reserve and store conditional 
instructions should be allocated by a system library program. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO 


PowerPC Architecture Level Supervisor Level Optional Form 


UISA X 























Chapter 8. Instruction Set 8-201 


stwu stwu 


Store Word with Update 


stwu rS,d(rA) 
[POWER mnemonic: stu] 


37 Ss A d 
0 5 6 10 11 15 16 31 





EA< (rA) + EXTS (d) 
MEM(EA, 4) < xS 
rA< EA 


EA is the sum (rA) + d. The contents of rS are stored into the word in memory addressed 
by EA. 


EA is placed into rA. 
If rA = 0, the instruction form is invalid. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















8-202 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


stwux 


Store Word with Update Indexed 


stwux rs,rA,rB 


[POWER mnemonic: stux] 


stwux 


[_] Reserved 





0 5 6 10 11 


EA (rA) + (xB) 
MEM(EA, 4) < xrS 
rA<¢ EA 


15 16 


20 21 


183 


30 31 


EA is the sum (rA) + (rB). The contents of rS are stored into the word in memory addressed 


by EA. 
EA is placed into rA. 


If rA = 0, the instruction form is invalid. 


Other registers altered: 


e None 


PowerPC Architecture Level 


Supervisor Level 


Optional 


Form 





UISA 














X 








Chapter 8. Instruction Set 


8-203 


stwx stwx 


Store Word Indexed 
stwx rs,rA,rB 


[POWER mnemonic: stx] 


[_] Reserved 


31 S A B 131 0 
0 5 6 10 11 15 16 20 21 30 31 





if rA = 0 then b¢ 0 
else b€ (rxrA) 
EA¢ b + (xB) 
MEM(EA, 4) © xS 
EA is the sum (rAl0) + (rB). The contents of rS are is stored into the word in memory 


addressed by EA. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 














UISA X 











8-204 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


subf x subfx 


Subtract From 





subf rD,rA,rB (OE=0 Rc =0) 
subf. rD,rA,rB (OE=0Rc=1) 
subfo rD,rA,rB (OE= 1 Rc=0) 
subfo. rD,rA,rB (OE=1 Rc=1) 
31 D A B OE 40 Re 
0 5 6 10 11 15 16 20 21 22 30 31 


rD< 7 (rA) + (xB) +1 


The sum 7 (rA) + (rB) + 1 is placed into rD. 
The subf instruction is preferred for subtraction because it sets few status bits. 


Other registers altered: 
¢ Condition Register (CRO field): 





Affected: LT, GT, EQ, SO (if Re = 1) 
e XER: 
Affected: SO, OV (if OE = 1) 
Simplified mnemonics: 
sub rD,rA,rB equivalent to subf rD,rB,rA 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XO 




















Chapter 8. Instruction Set 8-205 


subfcx subfcx 


Subtract from Carrying 


subfc rD,rA,rB (OE=0 Rc =0) 
subfe. rD,rA,rB (OE=0 Rc =1) 
subfco rD,rA,rB (OE= 1 Rc =0) 
subfco. rD,rA,rB (OE=1 Rc=1) 


[POWER mnemonics: sf, sf., sfo, sfo.] 


31 D A B OE 8 Re 
0 5 6 10 11 15 16 20 21 22 30 31 





rD<¢ 7 (rA) + (x©B) + 1 


The sum 7 (rA) + (rB) + | is placed into rD. 


Other registers altered: 
* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
Note: CRO field may not reflect the infinitely precise result if overflow occurs (see 
XER below). 
e¢ XER: 
Affected: CA 
Affected: SO, OV Gf OE = 1) 
Simplified mnemonics: 
sube rD,rA,rB equivalent to subfc rD,rB,rA 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XO 




















8-206 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


subfex subfex 


Subtract from Extended 


subfe rD,rA,rB (OE=0 Rc =0) 
subfe. rD,rA,rB (OE=0 Rc =1) 
subfeo rD,rA,rB (OE= 1 Rc =0) 
subfeo. rD,rA,rB (OE=1 Rc=1) 


[POWER mnemonics: sfe, sfe., sfeo, sfeo.] 





31 D A B OE 136 Re 
0 5 6 10 11 15 16 20 21 22 30 31 


rD<¢ 7 (rA) + (rB) + XER[CA] 


The sum 7 (rA) + (rB) + XER[CA] is placed into rD. 


Other registers altered: 
* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
Note: CRO field may not reflect the infinitely precise result if overflow occurs (see 
XER below). 
¢ XER: 

Affected: CA 
Affected: SO, OV Gf OE = 1) 

PowerPC Architecture Level Supervisor Level Optional Form 

UISA XO 




















Chapter 8. Instruction Set 8-207 


subfic subfic 


Subtract from Immediate Carrying 


subfic rD,rA,SIMM 
[POWER mnemonic: sfi] 


0 5 6 10 11 15 16 31 


rD< 7 (rA) + EXTS(SIMM) + 1 


The sum 7 (rA) + EXTS(SIMM) + 1 is placed into rD. 


Other registers altered: 
¢ XER: 
Affected: CA 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 




















8-208 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


subfmex subfmex 


Subtract from Minus One Extended 


subfme rD,rA (OE =0 Rc = 0) 
subfme. rD,rA (OE=0 Rc= 1) 
subfmeo rD,rA (OE = 1 Rc=0) 
subfmeo. rD,rA (OE=1Rc=1) 


[POWER mnemonics: sfme, sfme., sfmeo, sfmeo.] 




















Reserved 
31 D A 00000 JOE 232 Re 
0 5 6 10 11 15 16 20 21 22 30 31 
rD¢ 7 (rA) + XER[CA] - 1 
The sum 7 (rA) + XER[CA] + (32)1 is placed into rD. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO (if Re = 1) 
Note: CRO field may not reflect the infinitely precise result if overflow occurs (see 
XER below). 
e XER: 
Affected: CA 
Affected: SO, OV (if OE = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XO 




















Chapter 8. Instruction Set 8-209 


subfzex subfzex 


Subtract from Zero Extended 


subfze rD,rA (OE =0 Rc =0) 
subfze. rD,rA (OE=0 Rc= 1) 
subfzeo rD,rA (OE = 1 Rc=0) 
subfzeo. rD,rA (OE=1Rc=1) 


[POWER mnemonics: sfze, sfze., sfzeo, sfzeo. | 





[_] Reserved 
OO 
0 5 6 10 11 15 16 20 21 22 30 31 

xD¢ 7 (rA) + XER[CA] 
The sum 7 (rA) + XER[CA] is placed into rD. 
Other registers altered: 
* Condition Register (CRO field): 
Affected: LT, GT, EQ, SO (if Re = 1) 
Note: CRO field may not reflect the infinitely precise result if overflow occurs (see 
XER below). 
e XER: 
Affected: CA 
Affected: SO, OV (if OE = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA XO 




















8-210 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


sync sync 


Synchronize 


[POWER mnemonic: des] 


[_] Reserved 
[|e [ee Dee | = pl 
0 5 6 10 11 15 16 20 21 30 31 


The sync instruction provides an ordering function for the effects of all instructions 
executed by a given processor. Executing a sync instruction ensures that all instructions 
preceding the sync instruction appear to have completed before the syne instruction 
completes, and that no subsequent instructions are initiated by the processor until after the 
sync instruction completes. When the sync instruction completes, all external accesses 
caused by instructions preceding the syne instruction will have been performed with 
respect to all other mechanisms that access memory. For more information on how the syne 
instruction affects the VEA, refer to Chapter 5, ““Cache Model and Memory Coherency.” 


Multiprocessor implementations also send a syne address-only broadcast that is useful in 
some designs. For example, if a design has an external buffer that re-orders loads and stores 
for better bus efficiency, the syne broadcast signals to that buffer that previous loads/stores 
must be completed before any following loads/stores. 


The sync instruction can be used to ensure that the results of all stores into a data structure, 
caused by store instructions executed in a “critical section” of a program, are seen by other 
processors before the data structure is seen as unlocked. 


The functions performed by the sync instruction will normally take a significant amount of 
time to complete, so indiscriminate use of this instruction may adversely affect 
performance. In addition, the time required to execute syne may vary from one execution 
to another. 


The eieio instruction may be more appropriate than sync for many cases. 


This instruction is execution synchronizing. For more information on execution 
synchronization, see Section 4.1.5, “Synchronizing Instructions.” 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA X 




















Chapter 8. Instruction Set 8-211 


tibia tibia 


Translation Lookaside Buffer Invalidate All 








[_] Reserved 
[ os | 00000 00000 00000 370 lo | 
0 5 6 10 11 15 16 20 21 30 31 


All TLB entries ¢ invalid 


The entire translation lookaside buffer (TLB) is invalidated (that is, all entries are 
removed). 


The TLB is invalidated regardless of the settings of MSR[IR] and MSR[DR]. The 
invalidation is done without reference to the SLB, segment table, or segment registers. 


This instruction does not cause the entries to be invalidated in other processors. 
This is a supervisor-level instruction and optional in the PowerPC architecture. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


OEA V V X 























8-212 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


tlbie tlbie 


Translation Lookaside Buffer Invalidate Entry 


tibie rB 
[POWER mnemonic: tlbi] 





Reserved 











31 00 000 00000 B 30k6 0 
0 5 6 10 11 15 16 20 21 30 31 





VPS < rB[4-19] 
Identify TLB entries corresponding to VPS 
Each such TLB entry < invalid 


EA is the contents of rB. If the translation lookaside buffer (TLB) contains an entry 
corresponding to EA, that entry is made invalid (that is, removed from the TLB). 


Multiprocessing implementations (for example, the 601, and 604) send a tlbie address-only 
broadcast over the address bus to tell other processors to invalidate the same TLB entry in 
their TLBs. 


The TLB search is done regardless of the settings of MSR[IR] and MSR[DR]. The search 
is done based on a portion of the logical page number within a segment, without reference 
to the segment registers. All entries matching the search criteria are invalidated. 


Block address translation for EA, if any, is ignored. Refer to Section 7.5.3.4, 
“Synchronization of Memory Accesses and Referenced and Changed Bit Updates,” and 
Section 7.6.3, “Page Table Updates,” for other requirements associated with the use of this 
instruction. 


This is a supervisor-level instruction and optional in the PowerPC architecture. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





OEA V V X 




















Chapter 8. Instruction Set 8-213 


tlbsync tlbsync 


TLB Synchronize 








[_] Reserved 
00000 00000 00000 566 0 
0 5 6 10 11 15 16 20 21 30 31 


If an implementation sends a broadcast for tlbie then it will also send a broadcast for 
tlbsync. Executing a tlbsync instruction ensures that all tlbie instructions previously 
executed by the processor executing the tlbsync instruction have completed on all other 
processors. 


The operation performed by this instruction is treated as a caching-inhibited and guarded 
data access with respect to the ordering done by eieio. 


Note that the 601 expands the use of the sync instruction to cover tlbsync functionality. 


Refer to Section 7.5.3.4, “Synchronization of Memory Accesses and Referenced and 
Changed Bit Updates,” and Section 7.6.3, “Page Table Updates,” for other requirements 
associated with the use of this instruction. 


This instruction is supervisor-level and optional in the PowerPC architecture. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 


OEA V V X 























8-214 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


tw 
Trap Word 


tw 


[POWER mnemonic: t] 


TO,rA,rB 


tw 

















Reserved 
31 TO A B 4 0 
0 5 6 10 11 15 16 20 21 30 31 
a<¢ EXTS (xrA) 
b © EXTS (xB) 
if (a <b) & TO[O] then TRAP 
if (a > b) & TO[1] then TRAP 
if (a =b) & TO[2] then TRAP 
if (a <U b) & TO[3] then TRAP 
if (a >U b) & TO[4] then TRAP 


The contents of rA are compared with the contents of rB. If any bit in the TO field is set 
and its corresponding condition is met by the result of the comparison, then the system trap 


handler is invoked. 


Other registers altered: 


e None 


Simplified mnemonics: 


tweq rA,rB equivalent to 
twlge rA,rB equivalent to 
trap equivalent to 


PowerPC Architecture Level 


Supervisor Level 


tw 4, rA,rB 
tw 5,rA,rB 
tw =: 31,0,0 


Optional Form 





UISA 




















Chapter 8. Instruction Set 


8-215 


twi twi 


Trap Word Immediate 


twi TO,rA,SIMM 
[POWER mnemonic: ti] 


03 TO A SIMM 
0 5 6 10 11 15 16 31 





a<¢ EXTS (xrA) 

if (a < EXTS(SIMM)) & TO[0] then TRAP 
if (a > EXTS(SIMM)) & TO[1] then TRAP 
if (a = EXTS(SIMM)) & TO[2] then TRAP 
if (a <U EXTS(SIMM)) & TO[3] then TRAP 
if (a >U EXTS(SIMM)) & TO[4] then TRAP 


The contents of rA are compared with the sign-extended value of the SIMM field. If any 
bit in the TO field is set and its corresponding condition is met by the result of the 
comparison, then the system trap handler is invoked. 


Other registers altered: 


e None 


Simplified mnemonics: 





tweti rA,value equivalent to twi 8,rA,value 
twllei rA,value equivalent to twi 6,rA,value 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA D 




















8-216 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


XOrx XOrx 


XOR 
xor rA,rs,rB (Rc = 0) 
xor. rA,rS,rB (Re = 1) 
0 5 6 10 11 15 16 20 21 30 31 


rA< (rS) ® (xB) 
The contents of rS is XORed with the contents of rB and the result is placed into rA. 


Other registers altered: 
* Condition Register (CRO field): 





Affected: LT, GT, EQ, SO Gf Re = 1) 
PowerPC Architecture Level Supervisor Level Optional Form 
UISA X 




















Chapter 8. Instruction Set 8-217 


xori xori 


XOR Immediate 


xori rA,rS,;UIMM 
[POWER mnemonic: xoril] 


26 Ss A UIMM 
0 5 6 10 11 15 16 31 





rA< (xrS) ® ((16)0 || UIMM) 
The contents of rS are XORed with 0x0000 || UIMM and the result is placed into rA. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 








UISA D 

















8-218 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


xoris xoris 


XOR Immediate Shifted 


xoris rA,rS,;UIMM 


[POWER mnemonic: xoriu] 


27 Ss A UIMM 
0 5 6 10 11 15 16 31 





rA< (xrS) ® (UIMM || (16)0) 
The contents of rS are XORed with UIMM II 0x0000 and the result is placed into rA. 


Other registers altered: 


e None 


PowerPC Architecture Level Supervisor Level Optional Form 





UISA D 
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Appendix A 


PowerPC Instruction Set Listings 


This appendix lists the PowerPC architecture’s instruction set. Instructions are sorted by 
mnemonic, opcode, function, and form. Also included in this appendix is a quick reference 
table that contains general information, such as the architecture level, privilege level, and 


form, and indicates if the instruction is optional. 


Note that split fields, which represent the concatenation of sequences from left to right, are 


shown in lowercase. For more information refer to Chapter 8, “Instruction Set.” 


A.1 Instructions Sorted by Mnemonic 


Table A-1 lists the instructions implemented in the PowerPC architecture in alphabetical 


order by mnemonic. 


Key: 


(ail Reserved bits 


Table A-1. Complete Instruction List Sorted by Mnemonic 







































































Name 0 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
addx 31 D A B (OE 266 Re 
addcx 31 D A B (OE 10 Re 
addex 31 D A B (OE 138 Re 
addi 14 D A SIMM 
addic 12 D A SIMM 
addic. 13 D A SIMM 
addis 15 D A SIMM 
addmex 31 D A 00000 (OF 234 Re 
addzex 31 D A 00000 (OF 202 Re 
andx 31 Ss A B 28 Re 
andcx 31 Ss A B 60 Re 
andi. 28 Ss A UIMM 
Appendix A. PowerPC Instruction Set Listings A-1 





Name 0 


6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 












































































































































andis. 29 Ss A UIMM 
bx 18 Ll IAA|LK 
bex 16 BO BI BD AA\LK\ 
bectrx 19 BO BI 00000 528 LK 
belrx 19 BO BI 00000 16 LK 
cmp 31 cfD |O}L A B 0 0 
cmpi 11 cfD |O}L A SIMM 
cmpl 31 cfD |O}L A B 32 0 
cmpli 10 cfD |O}L A UIMM 
entlzwx 31 Ss A 00000 26 Re 
crand 19 crbD crbA crbB 257 0 
crandc 19 crbD crbA crbB 129 0 
creqv 19 crbD crbA crbB 289 0 
crnand 19 crbD crbA crbB 225 0 
crnor 19 crbD crbA crbB 33 0 
cror 19 crbD crbA crbB 449 0 
crore 19 crbD crbA crbB 417 0 
crxor 19 crbD crbA crbB 193 0 
dcba ' 31 00000 A B 758 0 
dcbf 31 00000 A B 86 0 
debi ? 31 00000 A B 470 0 
debst 31 00000 A B 54 0 
dcbt 31 00000 A B 278 0 
debtst 31 00000 A B 246 0 
dcbz 31 00000 A B 1014 0 
divwx 31 D A B (OE 491 Re 
divwux 31 D A B (OE 459 Re 
eciwx 31 D A B 310 0 
ecowx 31 Ss A B 438 0 
eieio 31 00000 00000 00000 854 0 
eqvx 31 Ss A B 284 Re 
extsbx 31 Ss A 00000 954 Re 
extshx 31 Ss A 00000 922 Re 
A-2 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 





Name 0 


6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 3 


= 



































































































































fabsx 63 D 00000 B 264 Re 
faddx 63 D A B 00000 21 Re 
faddsx 59 D A B 00000 21 Re 
fcmpo 63 crfD 00 A B 32 0 
fcmpu 63 crfD 00 A B 0 0 
fctiwx 63 D 00000 B 14 Re 
fetiwzx 63 D 00000 B 15 Re 
fdivx 63 D A B 00000 18 Re 
fdivsx 59 D A B 00000 18 Re 
fmaddx 63 D A B Cc 29 Re 
fmaddsx 59 D A B Cc 29 Re 
fmrx 63 D 00000 B 72 Re 
fmsubx 63 D A B Cc 28 Re 
fmsubsx 59 D A B Cc 28 Re 
fmulx 63 D A 00000 Cc 25 Re 
fmulsx 59 D A 00000 Cc 25 Re 
fnabsx 63 D 00000 B 136 Re 
fnegx 63 D 00000 B 40 Re 
fnmaddx 63 D A B Cc 31 Re 
fnmaddsx 59 D A B Cc 31 Re 
fnmsubx 63 D A B Cc 30 Re 
fnmsubsx 59 D A B Cc 30 Re 
fresx ! 59 D 00000 B 00000 24 Re 
frspx 63 D 00000 B 12 Re 
frsqrtex ! 63 D 00000 B 00000 26 Re 
fselx | 63 D A B Cc 23 Re 
fsqrtx | 63 D 00000 B 00000 22 Re 
fsqrtsx ' 59 D 00000 B 00000 22 Re 
fsubx 63 D A B 00000 20 Re 
fsubsx 59 D A B 00000 20 Re 
icbi 31 00000 A B 982 0 
isync 19 00000 00000 00000 150 0 
Ibz 34 D A d 
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Ibzu 35 D A d 
Ibzux 31 D A B 119 0 
Ibzx 31 D A B 87 0 
lfd 50 D A d 
Ifdu 51 D A d 
Ifdux 31 D A B 631 0 
Ifdx 31 D A B 599 0 
lfs 48 D A d 
Ifsu 49 D A d 
Ifsux 31 D A B 567 0 
lfsx 31 D A B 535 0 
Iha 42 D A d 
Ihau 43 D A d 
Ihaux 31 D A B 375 0 
Ihax 31 D A B 343 0 
Ihbrx 31 D A B 790 0 
Ihz 40 D A d 
Ihzu 41 D A d 
Ihzux 31 D A B 311 0 
Ihzx 31 D A B 279 0 
Imw 3 46 D A d 
Iswi ° 31 D A NB 597 0 
Iswx ° 31 D A B 533 0 
lwarx 31 D A B 20 0 
Iwbrx 31 D A B 534 0 
lwz 32 D A d 
Iwzu 33 D A d 
Iwzux 31 D A B 55 0 
Iwzx 31 D A B 23 0 
mert 19 erfD 00 erfS 00 00000 0 0 
merfs 63 erfD 00 erfS 00 00000 64 0 
merxr 31 erfD 00 00000 00000 512 0 
mfcr 31 D 00000 00000 19 0 


























A-4 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


Name 0 


6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 




































































































































































mffsx 63 D 00000 00000 583 Re 
mfmsr ° 31 D 00000 00000 83 0 
mfspr 4 31 D spr 339 0 
mfsr ° 31 D SR 00000 595 0 
mfsrin ° 31 D 00000 B 659 0 
mftb 31 D tor 371 0 
mtcrf 31 Ss CRM 144 0 
mtfsb0x 63 crbD 00000 00000 70 Re 
mtfsb1x 63 crbD 00000 00000 38 Re 
mtfsfx 63 0 0 B 711 Re 
mtfsfix 63 crfD 00 00000 IMM 134 Re 
mtmsr ° 31 Ss 00000 00000 146 0 
mtspr 4 31 Ss spr 467 0 
misr ° 31 s SR 00000 210 0 
misrin ° 31 Ss 00000 B 242 0 
mulhwx 31 D A B 0 75 Re 
mulhwux 31 D A B 0 11 Re 

mulli 7 D A SIMM 
mullwx 31 D A B (OE 235 Re 
nandx 31 Ss A B 476 Re 
negx 31 D A 00000 (0&8 104 Re 
norx 31 Ss A B 124 Re 
orx 31 Ss A B 444 Re 
orcx 31 Ss A B 412 Re 

ori 24 Ss A UIMM 

oris 25 Ss A UIMM 
rfi 3 19 00000 00000 00000 50 0 
rlwimix 20 Ss A SH MB ME Re 
rlwinmx 21 Ss A SH MB ME Re 
rlwnmx 23 Ss A B MB ME Re 
sc 17 00000 00000 00000000000000 0 
slwx 31 Ss A B 24 Re 
srawx 31 Ss A B 792 Re 
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6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
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srawix 31 Ss A SH 824 Re 
srwx 31 Ss A B 536 Re 
stb 38 Ss A d 
stbu 39 Ss A d 
stbux 31 Ss A B 247 0 
stbx 31 Ss A B 215 0 
stfd 54 Ss A d 
stfdu 55 Ss A d 
stfdux 31 Ss A B 759 0 
stfdx 31 Ss A B 727 0 
stfiwx ' 31 s A B 983 0 
stfs 52 Ss A d 
stfsu 53 Ss A d 
stfsux 31 Ss A B 695 0 
stfsx 31 Ss A B 663 0 
sth 44 Ss A d 
sthbrx 31 Ss A B 918 0 
sthu 45 Ss A d 
sthux 31 Ss A B 439 0 
sthx 31 Ss A B 407 0 
stmw ? 47 Ss A d 
stswi ° 31 Ss A NB 725 0 
stswx > 31 Ss A B 661 0 
stw 36 Ss A d 
stwbrx 31 Ss A B 662 0 
stwex. 31 Ss A B 150 1 
stwu 37 Ss A d 
stwux 31 Ss A B 183 0 
stwx 31 Ss A B 151 0 
subfx 31 D A B (OE 40 Re 
subfcx 31 D A B (OE 8 Re 
subfex 31 D A B (OE 136 Re 
subfic 08 D A SIMM 
A-6 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 





Name 0 


6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 



























































subfmex 31 D A 00000 (0&8 232 Re 
subfzex 31 D A 00000 (0&8 200 Re 
sync 31 00000 00000 00000 598 0 
tlbia 1° 31 00000 00000 00000 370 0 
tlbie 1° 31 00000 00000 B 306 0 
tlbsyne'? 31 00000 00000 00000 566 0 
tw 31 TO A B 4 0 
twi 03 TO A SIMM 
xorx 31 Ss A B 316 Re 
xori 26 Ss) A UIMM 
xoris 27 Ss A UIMM 
Notes: 
1 Optional instruction 
2 Supervisor-level instruction 
3 Load/store string/multiple instruction 
4 Supervisor- and user-level instruction 
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A.2 Instructions Sorted by Opcode 


Table A-2 lists the instructions defined in the PowerPC architecture in numeric order by 


opcode. 


Name 
twi 
mulli 
subfic 
cmpli 
cmpi 
addic 
addic. 
addi 
addis 
bex 
sc 
bx 
merf 
belrx 
crnor 
rfi! 
crandc 
isync 
crxor 
crnand 
crand 
creqv 
crorc 
cror 
bectrx 


rlwimix 


A-8 


Key: 


| Reserved bits 


Table A-2. Complete Instruction List Sorted by Opcode 


5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 











































































































000011 TO A SIMM 
000111 D A SIMM 
001000 D A SIMM 
001010 cfD |O}L A UIMM 
001011 erfD |O}L A SIMM 
001100 D A SIMM 
001101 D A SIMM 
001110 D A SIMM 
001111 D A SIMM 
010000 BO Bl BD AA\LK 
010001 00000 00000 000000000000000 1/0 
010010 Ll AA\LK 
010011 crfD 00 crfS 00 00000 0000000000 0 
010011 BO Bl 00000 0000010000 LK 
010011 crbD crbA crbB 0000100001 0 
010011 00000 00000 00000 0000110010 0 
010011 crbD crbA crbB 0010000001 0 
010011 00000 00000 00000 0010010110 0 
010011 crbD crbA crbB 0011000001 0 
010011 crbD crbA crbB 0011100001 0 
010011 crbD crbA crbB 0100000001 0 
010011 crbD crbA crbB 0100100001 0 
010011 crbD crbA crbB 0110100001 0 
010011 crbD crbA crbB 0111000001 0 
010011 BO Bl 00000 1000010000 LK 
010100 Ss A SH MB ME Re 














PowerPC Microprocessor Family: The Programming Environments (32-Bit) 





Name 0 


5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 










































































































































































rlwinmx 010101 iS) A SH MB ME Re 
rlwnmx 010111 Ss A B MB ME Re 
ori 011000 Ss A UIMM 
oris 011001 Ss A UIMM 
xori 011010 iS) A UIMM 
xoris 011011 Ss A UIMM 
andi. 011100 S) A UIMM 
andis. 011101 S A UIMM 
cmp 011111 crfD |0 A B 0000000000 0 
tw 011111 TO A B 0000000100 0 
subfcx 011111 D A B OE 0000001000 Re 
addcx 011111 D A B OE 0000001010 Re 
mulhwux 011111 D A B 0 0000001011 Re 
mfcr 011111 D 00000 00000 0000010011 0 
lwarx 011111 D A B 0000010100 0 
lwzx 011111 D A B 0000010111 0 
slwx 011111 S A B 0000011000 Re 
entlzwx 011111 Ss A 00000 0000011010 Re 
andx 011111 S) A B 0000011100 Re 
cmpl 011111 crfD |0 A B 0000100000 0 
subfx 011111 D A B OE 0000101000 Re 
debst 011111 00000 A B 0000110110 0 
lwzux 011111 D A B 0000110111 0 
andcx 011111 Ss A B 0000111100 Re 
mulhwx 011111 D A B 0 0001001011 Re 
mfmsr! 011111 D 00000 00000 0001010011 0 
dcbf 011111 00000 A B 0001010110 0 
Ibzx 011111 D A B 0001010111 0 
negx 011111 D A 00000 {OF 0001101000 Re 
Ibzux 011111 D A B 0001110111 0 
norx 011111 iS) A B 0001111100 Re 
subfex 011111 D A B OE 0010001000 Re 
addex 011111 D A B OE 0010001010 Re 
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mtcrf 01 iS) 0 CRM 0 0010010000 0 
mtmsr 01 iS) 00000 00000 0010010010 0 
stwex. 01 Ss A B 0010010110 1 
stwx 01 iS) A B 0010010111 0 
stwux 01 iS) A B 0010110111 0 
subfzex 01 D A 00000 (OF 0011001000 Re 
addzex 01 D A 00000 (OF 0011001010 Re 
misr ' 01 iS) 0 SR 00000 0011010010 0 
stbx 01 Ss A B 0011010111 0 
subfmex 01 D A 00000 (OF 0011101000 Re 
addmex 01 D A 00000 (OF 0011101010 Re 
mullwx 01 D A B OE 0011101011 Re 
mtsrin ! 01 iS) 00000 B 0011110010 0 
dcbtst 01 00000 A B 0011110110 0 
stbux 01 Ss A B 0011110111 0 
addx 01 D A B OE 0100001010 Re 
debt 01 00000 A B 0100010110 0 
Ihzx 01 D A B 0100010111 0 
eqvx 01 Ss A B 0100011100 Re 
tlbie'?) 01 00000 00000 B 0100110010 0 
eciwx 01 D A B 0100110110 0 
Ihzux 01 D A B 0100110111 0 
xorx 01 iS) A B 0100111100 Re 
mfspr ? 01 D spr 0101010011 0 
Ihax 01 D A B 0101010111 0 
tlbia 1-2 01 00000 00000 00000 0101110010 0 
mftb 01 D tbr 0101110011 0 
Ihaux 01 D A B 0101110111 0 
sthx 01 iS) A B 0110010111 0 
orex 01 iS) A B 0110011100 Re 
ecowx 01 Ss A B 0110110110 0 
sthux 01 iS) A B 0110110111 0 
orx 01 Ss A B 0110111100 Re 
A-10 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 





Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 

















































































































divwux} 011111 D A B OE 0111001011 Re 
mtspr ° 011111 Ss spr 0111010011 0 
debi ' 011111 00000 A B 0111010110 0 
mandx) 011111 s A B 0111011100 Re 
divwx) 011111 D A B OE 0111101011 Re 
merxr) 011111 criD | 00 00000 00000 1000000000 0 
Iswx*) 011111 D A B 1000010101 0 
Iwbrx| 011114 D A B 1000010110 0 
Ifsx} 011111 D A B 1000010111 0 
srwx| 011111 s A B 1000011000 Re 
tlbsyne 127) 011111 00000 00000 00000 1000110110 0 
Ifsux} 011111 D A B 1000110111 0 
mfsr?) 011111 D 0 SR 00000 1001010011 0 
Iswit} 011111 D A NB 1001010101 0 
sync] 011111 00000 00000 00000 1001010110 0 
Idx} 011111 D A B 1001010111 0 
Ifdux} 011111 D A B 1001110111 0 
mfsrin?| 011111 D 00000 B 1010010011 0 
stswx*) 011111 Ss A B 1010010101 0 
stwbrx} 011111 s A B 1010010110 0 
stisx]| 011111 Ss A B 1010010111 0 
stfisux} 011111 s A B 1010110111 0 
stswit} 011111 s A NB 1011010101 0 
stfidx) 011111 iS A B 1011010111 0 
dcba 2 31 00000 A B 1011110110 0 
stidux) 011111 s A B 1011110111 0 
Inbrx| 011111 D A B 1100010110 0 
srawx) 011111 s A B 1100011000 Re 
srawix 011111 Ss A SH 1100111000 Re 
eieio) 011111 00000 00000 00000 1101010110 0 
sthbrx| 011111 s A B 1110010110 0 
extshx) 011111 s A 00000 1110011010 Re 
extsbx) 011111 iS A 00000 1110111010 Re 
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Name 

icbi 
stfiwx ° 
dcebz 
lwz 
Iwzu 
Ibz 

Ibzu 
stw 
stwu 
stb 
stbu 
Ihz 

Ihzu 
Iha 
Ihau 


sth 


Ifdu 

stfs 
stfsu 
stfd 
stfdu 
fdivsx 
fsubsx 
faddsx 
fsqrtsx? 
2 


fresx 


fmulsx 
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0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
011111 00000 A B 1111010110 0 
011111 Ss A B 1111010111 0 
011111 00000 A B 1111110110 0 
100000 D A d 
100001 D A d 
100010 D A d 
100011 D A d 
100100 Ss A d 
100101 Ss A d 
100110 Ss A d 
100111 iS) A d 
101000 D A d 
101001 D A d 
101010 D A d 
101011 D A d 
101100 iS) A d 
101101 iS) A d 
101110 D A d 
101111 Ss A d 
110000 D A d 
110001 D A d 
110010 D A d 
110011 D A d 
110100 Ss A d 
110101 iS) A d 
110110 iS) A d 
110111 Ss A d 
111011 D A B 00000 10010 Re 
111011 D A B 00000 10100 Re 
111011 D A B 00000 10101 Re 
111011 D 00000 B 00000 10110 Re 
111011 D 00000 B 00000 11000 Re 
111011 D A 00000 Cc 11001 Re 


























PowerPC Microprocessor Family: The Programming Environments (32-Bit) 





Name 0 
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fmsubsx 111011 D A B Cc 11100 Re 
fmaddsx 111011 D A B Cc 11101 Re 
fnmsubsx 111011 D A B Cc 11110 Re 
fnmaddsx 111011 D A B Cc 11111 Re 
fcmpu 111111 crfD 00 A B 0000000000 0 
frspx ome is fan | D 00000 B 0000001100 Re 

fetiwx 111111 D 00000 B 0000001110 
fctiwzx Litt D 00000 B 0000001111 Re 
fdivx 111111 D A B 00000 10010 Re 
fsubx 111111 D A B 00000 10100 Re 
faddx 111111 D A B 00000 10101 Re 
fsqrtx 2 sae is a a | D 00000 B 00000 10110 Re 
fselx?) 111111 D A B c 10111. |Re 
fmulx ole Be a as D A 00000 Cc 11001 Re 
fmsubx 111111 D A B Cc 11100 Re 
fmaddx 111111 D A B Cc 11101 Re 
fnmsubx 111111 D A B Cc 11110 Re 
fnmaddx 1111141 D A B Cc 11111 Re 
fcmpo 111111 crfD 00 A B 0000100000 0 
mtfsb1x 111111 crbD 00000 00000 0000100110 Re 
fnegx 111111 D 00000 B 0000101000 Re 
merfs 111111 crfD 00 crfS 00 00000 0001000000 0 
mtfsb0x 111111 crbD 00000 00000 0001000110 Re 
fmrx VEL D 00000 B 0001001000 Re 
mtfsfix TAT 14 crfD 00 00000 IMM 0 0010000110 Re 
fnabsx PALE WU D 00000 B 0010001000 Re 
fabsx 111111 D 00000 B 0100001000 Re 
mffsx 111111 D 00000 00000 1001000111 Re 
mtfsfx 111111 0 FM 0 B 1011000111 Re 

Notes: 
1 Supervisor-level instruction 
2 Optional instruction 
3 Supervisor- and user-level instruction 
4 Load/store string/multiple instruction 
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A.3 Instructions Grouped by Functional Categories 
Table A-3 through Table A-30 list the PowerPC instructions grouped by function. 


Key: = Reserved bits 


Table A-3. Integer Arithmetic Instructions 

















































































































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
addx 31 D A B (OE 266 Re 
addcx 31 D A B (OE 10 Re 
addex 31 D A B OE} 138 Re 
addi 14 D A SIMM 
addic 12 D A SIMM 
addic. 13 D A SIMM 
addis 15 D A SIMM 
addmex 31 D A 00000 ({0& 234 Re 
addzex 31 D A 00000 ({0& 202 Re 
divwx 31 D A B OE} 491 Re 
divwux 31 D A B OE} 459 Re 
mulhwx 31 D A B 0 75 Re 
mulhwux 31 D A B 0 11 Re 
mulli 07 D A SIMM 
mullwx 31 D A B OE} 235 Re 
negx 31 D A 00000 (OF 104 Re 
subfx 31 D A B OE} 40 Re 
subfcx 31 D A B OE} 8 Re 
subficx 08 D A SIMM 
subfex 31 D A B OE} 136 Re 
subfmex 31 D A 00000 ({0& 232 Re 
subfzex 31 D A 00000 ({0& 200 Re 
Table A-4. Integer Compare Instructions 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
cmp 31 cfD |O/;L A B 0000000000 0 
cmpi 11 cfD |O;L A SIMM 
cmpl 31 cfD |O;L A B 32 0 
































A-14 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


cmpili 


Name 
andx 
andcx 
andi. 
andis. 
entlzwx 
eqvx 
extsbx 
extshx 
nandx 
norx 
orx 
orcx 
ori 
oris 
xorx 
xori 


xoris 
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crfD 














A 





UIMM 
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Table A-5. Integer Logical Instructions 


5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
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= 

























































































31 Ss A B 28 Re 
31 Ss A B 60 Re 
28 Ss A UIMM 
29 Ss A UIMM 
31 Ss A 00000 26 Re 
31 Ss A B 284 Re 
31 Ss A 00000 954 Re 
31 Ss A 00000 922 Re 
31 Ss A B 476 Re 
31 Ss A B 124 Re 
31 Ss A B 444 Re 
31 Ss A B 412 Re 
24 Ss A UIMM 
25 Ss A UIMM 
31 Ss A B 316 Re 
26 Ss A UIMM 
27 s A UIMM 
A-15 


Table A-6. Integer Rotate Instructions 











Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
rlwimix 22 Ss A SH MB ME Re 
rlwinmx 20 Ss A SH MB ME Re 
rlwnmx 21 Ss A SH MB ME Re 





























Table A-7. Integer Shift Instructions 














Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
slwx 31 iS} A B 24 Re 
srawx 31 Ss A B 792 Re 
srawix 31 Ss A SH 824 Re 
srwx 31 Ss A B 536 Re 


























A-16 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


Name 
faddx 
faddsx 
fdivx 
fdivsx 
fmulx 
fmulsx 
fresx ! 
frsqrtex ' 
fsubx 
fsubsx 
fselx ' 
fsqrtx ! 


fsqrtsx ! 


Name 
fmaddx 
fmaddsx 
fmsubx 
fmsubsx 
fnmaddx 
fnmaddsx 
fnmsubx 


fnmsubsx 


0 


Table A-8. Floating-Point Arithmetic Instructions 


5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 

























































































63 D A B 00000 21 Re 
59 D A B 00000 21 Re 
63 D A B 00000 18 Re 
59 D A B 00000 18 Re 
63 D A 00000 Cc 25 Re 
59 D A 00000 Cc 25 Re 
59 D 00000 B 00000 24 Re 
63 D 00000 B 00000 26 Re 
63 D A B 00000 20 Re 
59 D A B 00000 20 Re 
63 D A B Cc 23 Re 
63 D 00000 B 00000 22 Re 
59 D 00000 B 00000 22 Re 

Note: 
1 Optional instruction 

Table A-9. Floating-Point Multiply-Add Instructions 

0 5 6 7 8 9 1011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
63 D A B Cc 29 Re 
59 D A B Cc 29 Re 
63 D A B Cc 28 Re 
59 D A B Cc 28 Re 
63 D A B Cc 31 Re 
59 D A B Cc 31 Re 
63 D A B Cc 30 Re 
59 D A B Cc 30 Re 
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Table A-10. Floating-Point Rounding and Conversion Instructions 


Name 
fctiwx 
fctiwzx 


frspx 


Name 
fcmpo 


fcmpu 


Table A-12. Floating-Point Status and Control Register Instructions 


Name 
merfs 
mffsx 

mtfsb0x 

mtfsb1x 
mtfstfx 


mtfsfix 


A-18 











0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
63 D 00000 B 14 Re 
63 D 00000 B 15 Re 
63 D 00000 B 12 Re 























Table A-11. Floating-Point Compare Instructions 








0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
63 crfD 00 A B 32 0 
63 crfD 00 A B 0 0 
























































0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
63 crfD 00 crfS 00 00000 64 0 
63 D 00000 00000 583 Re 
63 crbD 00000 00000 70 Re 
63 crbD 00000 00000 38 Re 
31 0 0 B 711 Re 
63 crfD 00 00000 IMM 134 Re 





























PowerPC Microprocessor Family: The Programming Environments (32-Bit) 











Table A-13. Integer Load Instructions 






















































































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Ibz 34 D A d 
Ibzu 35 D A d 
Ibzux 31 D A 119 0 
Ibzx 31 D A 87 0 
Ilha 42 D A d 
Ihau 43 D A d 
Ihaux 31 D A 375 0 
Ihax 31 D A 343 0 
Ihz 40 D A d 
Ihzu 4 D A d 
Ihzux 31 D A 311 0 
Ihzx 31 D A 279 0 
lwz 32 D A d 
Iwzu 33 D A d 
Iwzux 31 D A 55 0 
lwzx 31 D A 23 0 
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Table A-14. Integer Store Instructions 


















































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
stb 38 Ss A d 
stbu 39 Ss A d 
stbux 31 Ss A B 247 0 
stbx 31 Ss A B 215 0 
sth 44 Ss A d 
sthu 45 Ss A d 
sthux 31 Ss A B 439 0 
sthx 31 Ss A B 407 0 
stw 36 Ss A d 
stwu 37 Ss A d 
stwux 31 Ss A B 183 0 
stwx 31 Ss A B 151 0 


























Table A-15. Integer Load and Store with Byte Reverse Instructions 














Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Ihbrx 31 D A B 790 0 
Iwbrx 31 D A B 534 0 

sthbrx 31 iS} A B 918 0 

stwbrx 31 Ss A B 662 0 


























Table A-16. Integer Load and Store Multiple Instructions 


























Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Imw 1 46 D A d 
stmw ! 47 Ss A d 
Note: 


' Load/store string/multiple instruction 


Table A-17. Integer Load and Store String Instructions 











Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Iswi ! 31 D A NB 597 0 
Iswx ' 31 D A B 533 0 

stswi ' 31 Ss A NB 725 0 


























A-20 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


stswx 


Name 
eieio 
isyne 
lwarx 
stwex. 


sync 


Name 


lfd 
Ifdu 
Ifdux 
lfdx 
Ifs 
lfsu 
Ifsux 


Ifsx 








31 














661 











Note: 
' Load/store string/multiple instruction 


0 


5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 


Table A-18. Memory Synchronization Instructions 


31 































































































31 00000 00000 00000 854 0 
19 00000 00000 00000 150 0 
31 D A B 20 0 
31 Ss A B 150 1 
31 00000 00000 00000 598 0 
Table A-19. Floating-Point Load Instructions 
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
50 D A d 
51 D A d 
31 D A B 631 0 
31 D A B 599 0 
48 D A d 
49 D A d 
31 D A B 567 0 
31 D A B 535 0 
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Table A-20. Floating-Point Store Instructions 



































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
stfd 54 Ss A d 
stfdu 55 Ss A d 
stfdux 31 Ss A B 759 0 
stfdx 31 Ss A B 727 0 
stfiwx ' 31 Ss A B 983 0 
stfs 52 Ss A d 
stfsu 53 Ss A d 
stfsux 31 Ss A B 695 0 
stfsx 31 Ss A B 663 0 


























1 Optionarinstruction 


Table A-21. Floating-Point Move Instructions 














Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
fabsx 63 D 00000 B 264 Re 
fmrx 63 D 00000 B 72 Re 
fnabsx 63 D 00000 B 136 Re 
fnegx 63 D 00000 B 40 Re 


























Table A-22. Branch Instructions 

















Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
bx 18 LI JAA\LK 

bex 16 BO BI BD JAA\LK 
bectrx 19 BO BI 00000 528 LK 
belrx 19 BO BI 00000 16 LK 
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Table A-23. Condition Register Logical Instructions 


























































































































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
crand 19 crbD crbA crbB 257 0 
crandce 19 crbD crbA crbB 129 0 
creqv 19 crbD crbA crbB 289 0 
crnand 19 crbD crbA crbB 225 0 
crnor 19 crbD crbA crbB 33 0 
cror 19 crbD crbA crbB 449 0 
crorc 19 crbD crbA crbB 417 0 
crxor 19 crbD crbA crbB 193 0 
merf 19 crfD 00 crfS 00 00000 0000000000 0 
Table A-24. System Linkage Instructions 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
rfi | 19 00000 00000 00000 50 0 
sc 17 00000 00000 000000000000000 0 
Note: 
1 Supervisor-level instruction 
Table A-25. Trap Instructions 

Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
tw 31 TO A B 4 0 

twi 03 TO A SIMM 
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Table A-26. Processor Control Instructions 














































































































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
merxr 31 crfS 00 00000 00000 512 0 
mfcr 31 D 00000 00000 19 0 
mfmsr ! 31 D 00000 00000 83 0 
mfspr 2 31 D spr 339 0 
mftb 31 D tpr 371 0 
mterf 31 Ss 0 CRM 0 144 0 
mtmsr ! 31 iS) 00000 00000 146 0 
mtspr 2 31 D spr 467 0 
Notes: 
' Supervisor-level instruction 
2 Supervisor- and user-level instruction 
Table A-27. Cache Management Instructions 

Name 0 5 6 7 8 9 1011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 

deba'| 31 00000 A B 758 0 

dcbf 31 00000 A B 86 0 

debi ? 31 00000 A B 470 0 

dcbst 31 00000 A B 54 0 

debt 31 00000 A B 278 0 

debtst 31 00000 A B 246 0 

dcbz 31 00000 A B 1014 0 

icbi 31 00000 A B 982 0 

Notes: 


1 Optional instruction 
2 Supervisor-level instruction 
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Table A-28. Segment Register Manipulation Instructions. 








































































































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
mfsr 1 31 D 0 SR 00000 595 0 
mfsrin | 31 D 00000 B 659 0 
misr | 31 Ss 0 SR 00000 210 0 
misrin | 31 s 00000 B 242 0 
Note: 
1 Supervisor-level instruction 
Table A-29. Lookaside Buffer Management Instructions 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
tlbia 12 31 00000 00000 00000 370 0 
tlbie |? 31 00000 00000 B 306 0 
tlbsync ' 31 00000 00000 00000 566 0 
Notes: 
' Supervisor-level instruction 
2 Optional instruction 
Table A-30. External Control Instructions 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
eciwx 31 D A B 310 0 
ecowx 31 5 A B 438 0 
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A.4 Instructions Sorted by Form 
Table A-31 through list the PowerPC instructions grouped by form. 


Table A-31. l-Form 





OPCD LI AA|LK 




















Specific Instruction 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 





bx 18 Ll AA|LK 




















Table A-32. B-Form 





OPCD BO BI BD AA|LK 


























Specific Instruction 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 





bex 16 BO BI BD AA|LK 


























Table A-33. SC-Form 





OPCD 00000 00000 000000000000000 1/0 


























Specific Instruction 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 





sc 17 00000 00000 000000000000000 1/0 


























Table A-34. D-Form 





























OPCD D A d 

OPCD D A SIMM 
OPCD S) A d 

OPCD Ss A UIMM 
OPCD cfD |O/;L A SIMM 
OPCD cfD |O/;L A UIMM 
OPCD TO A SIMM 
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Specific Instructions 











































































































Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
addi 14 D A SIMM 
addic 12 D A SIMM 
addic. 13 D A SIMM 
addis 15 D A SIMM 
andi. 28 S A UIMM 
andis. 29 Ss A UIMM 
cmpi 11 cfD |O;L A SIMM 
cmpli 10 cfD |O;L A UIMM 
Ibz 34 D A d 
Ibzu 35 D A d 
Ifd 50 D A d 
Ifdu 51 D A d 
lfs 48 D A d 
lfsu 49 D A d 
Iha 42 D A d 
Ihau 43 D A d 
Ihz 40 D A d 
Ihzu 4 D A d 
Imw 1 46 D A d 
lwz 32 D A d 
Iwzu 33 D A d 
mulli 7 D A SIMM 
ori 24 cS) A UIMM 
oris 25 Ss A UIMM 
stb 38 Ss A d 
stbu 39 S) A d 
stfd 54 S A d 
stfdu 55 S) A d 
stfs 52 s A d 
stfsu 53 S) A d 
sth 44 Ss A d 
sthu 45 Ss A d 
stmw | 47 Ss A d 
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stw 36 Ss A d 
stwu 37 Ss A d 
subfic 08 D A SIMM 
twi 03 TO A SIMM 
xori 26 Ss A UIMM 
xoris 27 Ss A UIMM 
Note: 


' Load/store string/multiple instruction 


Table A-35. X-Form 

















































































































OPCD D A B xO 0 
OPCD D A NB xO 0 
OPCD D 00000 B xO 0 
OPCD D 00000 00000 XO 0 
OPCD D 0 SR 00000 xO 0 
OPCD S) A B xO Re 
OPCD S) A B XO 1 
OPCD Ss A B xO 0 
OPCD iS) A NB xO 0 
OPCD Ss A 00000 xO Re 
OPCD Ss 00000 B xO 0 
OPCD S) 00000 00000 XO 0 
OPCD S) 0 SR 00000 xO 0 
OPCD Ss SH xO Re 
OPCD cfD |O}L B xO 0 
OPCD crfD 00 A B XO 0 
OPCD crfD 00 crfS 00 00000 xO 0 
OPCD crfD 00 00000 00000 XO 0 
OPCD crfD 00 00000 IMM xO Re 
OPCD TO A B xO 0 
OPCD D 00000 B xO Re 
OPCD D 00000 00000 XO Re 
OPCD crbD 00000 00000 xO Re 
OPCD 00000 A B XO 0 
OPCD 00000 00000 B XO 0 
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andx 
andcx 
cmp 
cmpl 
entizwx 
dcba ' 
dcbf 
debi ? 
debst 
debt 
debtst 
dcbz 
eciwx 
ecowx 
eieio 
eqvx 
extsbx 
extshx 
fabsx 
fcmpo 
fcmpu 
fctiwx 
fetiwzx 
fmrx 
fnabsx 
fnegx 
frspx 
icbi 
Ibzux 
Ibzx 
Ifdux 
lfdx 









































































































































OPCD 00000 00000 00000 XO 0 
Specific Instructions 
31 Ss A B 28 Re 
31 Ss A B 60 Re 
31 cfD |O;L A B 0 0 
31 cfD |O;L A B 32 0 
31 Ss A 00000 26 Re 
31 00000 A B 758 0 
31 00000 A B 86 0 
31 00000 A B 470 0 
31 00000 A B 54 0 
31 00000 A B 278 0 
31 00000 A B 246 0 
31 00000 A B 1014 0 
31 D A B 310 0 
31 Ss A B 438 0 
31 00000 00000 00000 854 0 
31 Ss A B 284 Re 
31 Ss A 00000 954 Re 
31 Ss A 00000 922 Re 
63 D 00000 B 264 Re 
63 crfD 00 A B 32 0 
63 crfD 00 A B 0 0 
63 D 00000 B 14 Re 
63 D 00000 B 15 Re 
63 D 00000 B 72 Re 
63 D 00000 B 136 Re 
63 D 00000 B 40 Re 
63 D 00000 B 2 Re 
31 00000 A B 982 0 
31 D A B 119 0 
31 D A B 87 0 
31 D A B 631 0 
31 D A B 599 0 
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Ifsux 31 D A B 567 0 
lfsx 31 D A B 535 0 
Ihaux 31 D A B 375 0 
Ihax 31 D A B 343 0 
Ihbrx 31 D A B 790 0 
Ihzux 31 D A B 311 0 
Ihzx 31 D A B 279 0 
Iswi $ 31 D A NB 597 0 
Iswx 4 31 D A B 533 0 
lwarx 31 D A B 20 0 
Iwbrx 31 D A B 534 0 
Iwzux 31 D A B 55 0 
lwzx 31 D A B 23 0 
merfs 63 crfD 00 crfS 00 00000 64 0 
merxr 31 crfD 00 00000 00000 512 0 
mfcr 31 D 00000 00000 19 0 
mffsx 63 D 00000 00000 583 Re 
mfmsr ° 31 D 00000 00000 83 0 
mfsr ° 31 D 0 SR 00000 595 0 
mfsrin ° 31 D 00000 B 659 0 
mtfsb0x 63 crbD 00000 00000 70 Re 
mtfsb1x 63 crfD 00000 00000 38 Re 
mtfsfix 63 crbD 00 00000 IMM 0 134 Re 
mtmsr ° 31 Ss 00000 00000 146 0 
mtsr ° 31 Ss 0 SR 00000 210 0 
misrin ° 31 Ss 00000 B 242 0 
nandx 31 Ss A B 476 Re 
norx 31 Ss A B 124 Re 
orx 31 Ss A B 444 Re 
orex 31 Ss A B 412 Re 
slwx 31 Ss A B 24 Re 
srawx 31 Ss A B 792 Re 
srawix 31 Ss A SH 824 Re 
srwx 31 Ss A B 536 Re 
stbux 31 Ss A B 247 0 
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stbx 31 Ss A B 215 0 
stfdux 31 Ss A B 759 0 
stfdx 31 Ss A B 727 0 
stfiwx ' 31 Ss A B 983 0 
stfsux 31 Ss A B 695 0 
stfsx 31 Ss A B 663 0 
sthbrx 31 Ss A B 918 0 
sthux 31 Ss A B 439 0 
sthx 31 Ss A B 407 0 
stswi * 31 Ss A NB 725 0 
stswx * 31 Ss A B 661 0 
stwbrx 31 Ss A B 662 0 
stwex. 31 Ss A B 150 1 
stwux 31 Ss A B 183 0 
stwx 31 Ss A B 151 0 
sync 31 00000 00000 00000 598 0 
tlbia 5 31 00000 00000 00000 370 0 
tlbie °° 31 00000 00000 B 306 0 
tlbsyne °° 31 00000 00000 00000 566 0 
tw 31 TO A B 4 0 
xorx 31 Ss A B 316 Re 
Notes: 
1 Optional instruction 
2 Supervisor-level instruction 
3 Load/store string/multiple instruction 
Appendix A. PowerPC Instruction Set Listings A-31 


Name 
bectrx 
belrx 
crand 
crande 
creqv 
crnand 
crnor 
cror 
crorc 
crxor 
isyne 
merf 
rfi | 


rfid ' 


Name 
mfspr 
mftb 


mtcrf 
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Table A-36. XL-Form 























OPCD BO BI 00000 XO LK 
OPCD crbD crbA crbB XO 0 
OPCD cfD | 00] crfS | 00 00000 XO 0 
OPCD 00000 00000 00000 XO 0 




















Specific Instructions 




















































































































0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
19 BO BI 00000 528 LK 
19 BO BI 00000 16 LK 
19 crbD crbA crbB 257 0 
19 crbD crbA crbB 129 0 
19 crbD crbA crbB 289 0 
19 crbD crbA crbB 225 0 
19 crbD crbA crbB 33 0 
19 crbD crbA crbB 449 0 
19 crbD crbA crbB 417 0 
19 crbD crbA crbB 193 0 
19 00000 00000 00000 150 0 
19 crfD 00 crfS 00 00000 0 0 
19 00000 00000 00000 50 0 
19 00000 00000 00000 18 0 
Note: 
' Supervisor-level instruction 
Table A-37. XFX-Form 
OPCD D spr XO 0 
OPCD D 0 CRM 0 XO 0 
OPCD Ss spr XO 0 
OPCD D tor XO 0 
Specific Instructions 
0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
31 D spr 339 0 
31 D tor 371 0 
31 Ss 0 CRM 0 144 0 
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mtspr ! 31 D spr 467 0 
Note: 
' Supervisor- and user-level instruction 
Table A-38. XFL-Form 
| opco |o| FM |o| B XO [Rc] 
Specific Instructions 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
mtfstx 63 0 FM 0 B 711 Re 
Table A-39. XO-Form 
OPCD D B (OE XO Re 
OPCD D B 0 XO Re 
OPCD D 00000 (&E XO Re 
Specific Instructions 
Name 0 5 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
addx 31 D A B OE} 266 Re 
addcx 31 D A B (OE 10 Re 
addex 31 D A B (OE 138 Re 
addmex 31 D A 00000 (OF 234 Re 
addzex 31 D A 00000 (OF 202 Re 
divwx 31 D A B (OE 491 Re 
divwux 31 D A B (OE 459 Re 
mulhwx 31 D A B 0 75 Re 
mulhwux 31 D A B 0 11 Re 
mullwx 31 D A B (OE 235 Re 
negx 31 D A 00000 (OF 104 Re 
subfx 31 D A B (OE 40 Re 
subfcx 31 D A B OE} 8 Re 
subfex 31 D A B OE} 136 Re 
subfmex 31 D A 00000 (OF 232 Re 
subfzex 31 D A 00000 (OF 200 Re 
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Table A-40. A-Form 



































































































































OPCD D A B 00000 XO Re 
OPCD D A B Cc XO Re 
OPCD D A 00000 Cc XO Re 
OPCD D 00000 B 00000 XO Re 

Specific Instructions 
Name 0 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
faddx 63 D A B 00000 21 Re 
faddsx 59 D A B 00000 21 Re 
fdivx 63 D A B 00000 18 Re 
fdivsx 59 D A B 00000 18 Re 
fmaddx 63 D A B Cc 29 Re 
fmaddsx 59 D A B Cc 29 Re 
fmsubx 63 D A B Cc 28 Re 
fmsubsx 59 D A B Cc 28 Re 
fmulx 63 D A 00000 Cc 25 Re 
fmulsx 59 D A 00000 Cc 25 Re 
fnmaddx 63 D A B Cc 31 Re 
fnmaddsx 59 D A B Cc 31 Re 
fnmsubx 63 D A B Cc 30 Re 
fnmsubsx 59 D A B Cc 30 Re 
fresx | 59 D 00000 B 00000 24 Re 
frsqrtex 63 D 00000 B 00000 26 Re 
fselx | 63 D A B c 23 Re 
fsqrtx | 63 D 00000 B 00000 22 Re 
fsqrtsx ! 59 D 00000 B 00000 22 Re 
fsubx 63 D A B 00000 20 Re 
fsubsx 59 D A B 00000 20 Re 

Note: 


1 Optional instruction 
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Name 
rlwimix 
rlwinmx 


rlwnmx 


Appendix A. PowerPC Instruction Set Listings 


Table A-41. M-Form 







































































OPCD Ss SH MB ME Re 
OPCD Ss B MB ME Re 
Specific Instructions 

0 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
20 Ss A SH MB ME Re 

21 Ss A SH MB ME Re 

23 Ss A B MB ME Re 
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A.5 Instruction Set Legend 


Table A-42 provides general information on the PowerPC instruction set (such as the 
architectural level, privilege level, and form). 


Table A-42. PowerPC Instruction Set Legend 


Supervisor Level Optional 


entlzwx 
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Table A-42. PowerPC Instruction Set Legend (Continued) 
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Table A-42. PowerPC Instruction Set Legend (Continued) 
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fnabsx 
nmaddsx 
nmsubsx 


PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


A-38 


Table A-42. PowerPC Instruction Set Legend (Continued) 
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Table A-42. PowerPC Instruction Set Legend (Continued) 


=I oO 


o 
> 
Cy 

aa 
5 
a | > 
> 
— 

oO 

Q 

3 
no 


Z > > 


| 


mulhwux 
rlwinmx 
srawix 





PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


A-40 


Table A-42. PowerPC Instruction Set Legend (Continued) 
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Table A-42. PowerPC Instruction Set Legend (Continued) 





1 Supervisor- and user-level instruction 
? Load/store string or multiple instruction 
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Appendix B 
POWER Architecture Cross Reference 


This appendix identifies the incompatibilities that must be managed in migration from the 
POWER architecture to PowerPC architecture. Some of the incompatibilities can, at least 
in principle, be detected by the processor, which traps and lets software simulate the 
POWER operation. Others cannot be detected by the processor. 


In general, the incompatibilities identified here are those that affect a POWER application 
program. Incompatibilities for instructions that can be used only by POWER system 
programs are not discussed. Note that this appendix describes incompatibilities with 
respect to the PowerPC architecture in general. 


B.1 New Instructions, Formerly Supervisor-Level 
Instructions 


Instructions new to PowerPC typically use opcode values (including extended opcode) that 
are illegal in the POWER architecture. A few instructions that are supervisor-level in the 
POWER architecture (for example, delz, called dcbz in the PowerPC architecture) have 
been made user-level in the PowerPC architecture. Any POWER program that executes one 
of these now-valid, or now-user-level, instructions expecting to cause the system illegal 
instruction error handler (program exception) or the system supervisor-level instruction 
error handler to be invoked, will not execute correctly on PowerPC processors. (Note that, 
in the architecture specification, user- and supervisor-level are referred to as problem and 
privileged state, respectively, and exceptions are referred to as interrupts.) 


B.2 New Supervisor-Level Instructions 


The following instructions are user-level in the POWER architecture but are supervisor- 
level in PowerPC processors. 


¢ mfmsr 


¢ mfsr 
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B.3 Reserved Bits in Instructions 


These are shown as zeros and the bit field is shaded in the instruction opcode definitions. 
In the POWER architecture such bits are ignored by the processor. In the PowerPC 
architecture they must be zero or the instruction form is invalid. In several cases, the 
PowerPC architecture assumes that such bits in POWER instructions are indeed zero. The 
cases include the following: 


* cmpi, cmp, cmpli, and cmpl assume that bit 10 in the POWER instructions is 0. 
¢ mtspr and mfspr assume that bits 16-20 in the POWER instructions are 0. 


B.4 Reserved Bits in Registers 


The POWER architecture defines these bits to be zero when read, and either zero or one 
when written to. In the PowerPC architecture it is implementation-dependent for each 
register, whether these bits are zero when read, and ignored when written to, or are copied 
from source to destination when read or written to. 


B.5 Alignment Check 


The AL bit in the POWER machine state register, MSR[24], is not supported in the 
PowerPC architecture. The bit is reserved in the PowerPC architecture. The low-order bits 
of the EA are always used. Notice that value zero—the normal value for a reserved SPR 
bit—means ignore the low-order EA bits in the POWER architecture, and value one means 
use the low-order EA bits. However, MSR[24] is not assigned new meaning in the PowerPC 
architecture. 


B.6 Condition Register 


The following instructions specify a field in the condition register (CR) explicitly (via the 
crfD field) and also have the record bit (Rc) option. In the PowerPC architecture, if Rc = 1 
for these instructions the instruction form is invalid. In the POWER architecture, if Rc = 1 
the instructions execute normally except as shown in Table B-1. 


Table B-1. Condition Register Settings 


[emp GRO is undefined if Re = 1 and erfD +0 
fempl | CRO is undefined if Re = 1 and erfD +0 


[menar [GRO uieaitRe= awn -0 
Cr 
Fempe [oR euioeaine=T 
[mes [CRTs ieaine =r anor 
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B.7 Inappropriate Use of LK and Rc bits 
For the instructions listed below, if LK = 1 or Rc = 1, POWER processors execute the 
instruction normally with the exception of setting the link register (if LK = 1) or the CRO 
or CRI fields Gf Re = 1) to an undefined value. In the PowerPC architecture, such 
instruction forms are invalid. 
The PowerPC instruction form is invalid if LK = 1: 

* sc (svex in the POWER architecture) 


* Condition register logical instructions (that is, crand, crandc, creqv, crnand, 
crnor, cror, crorc, and crxor) 


e merf 
* isync (ics in the POWER architecture) 


The PowerPC instruction form is invalid if Re = 1: 
¢ Integer X-form load and store instructions: 


— X-form load instructions—Ibzux, lbzx, Ihaux, Ihax, Ihbrx, Ihzux, Ihzx, Iswi, 
Iswx, lwarx, lwbrx, lwzux, lwzx 


— X-form store instructions—stbux, stbx, sthbrx, sthux, sthx, stswi, stswx, 
stwbrx, stwex., stwux, stwx 


¢ Integer X-form compare instructions (that is, cmp, empl) 
¢ X-form trap instruction (that is, td) 
* mtspr, mfspr, mtcrf, mcrxr, mfcr 


¢ Floating-point X-form load and store instructions and floating-point compare 
instructions 


— Floating-point X-form load instructions— Ifdux, lfdx, lfsux, lfsx 
— Floating-point X-form store instructions—stfdux, stfdx, stfiwx, stfsux, stfsx 
— Floating-point X-form compare instruction—fempo, fempu 

¢ moerfs 

¢ dcbz (delz in the POWER architecture) 


B.8 BO Field 


The POWER architecture shows certain bits in the BO field—used by branch conditional 
instructions—as x without indicating how these bits are to be interpreted. These bits are 
ignored by POWER processors. 


The PowerPC architecture shows these bits as either z or y. The z bits are ignored, as in 
POWER. However, the y bit need not be ignored, but rather can be used to give a hint about 
whether the branch is likely to be taken. If a POWER program has the incorrect value for 
this bit, the program will run correctly but performance may suffer. 
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B.9 Branch Conditional to Count Register 


For the case in which the count register is decremented and tested (that is, the case in which 
BO[2] = 0), the POWER architecture specifies only that the branch target address is 
undefined, implying that the count register, and the link register (if LK = 1), are updated in 
the normal way. The PowerPC architecture considers this instruction form invalid. 


B.10 System Call/Supervisor Call 


The System Call (sce) instruction in the PowerPC architecture is called Supervisor Call 
(svex) in the POWER architecture. Differences in implementations are as follows: 


¢ The POWER architecture provides a version of the svex instruction (bit 30 = 0) that 
allows instruction fetching to continue at any one of 128 locations. It is used for “fast 
Supervisor Calls.” The PowerPC architecture provides no such version. If bit 30 of 
the instruction is zero the instruction form is invalid. 


¢ The POWER architecture provides a version of the svex instruction 
(bits 30-31 = 0b11) that resumes instruction fetching at one location and sets the 
link register (LR) to the address of the next instruction. The PowerPC architecture 
provides no such version; if Re = 1, the instruction form is invalid. 


¢ For the POWER architecture, information from the MSR is saved in the count 
register (CTR). For the PowerPC architecture, this information is saved in the 
machine status save/restore register 1 (SRR1). 

¢ The POWER architecture permits bits 16—29 of the instruction to be nonzero, while 
in the PowerPC architecture, such an instruction form is invalid. 

¢ The POWER architecture saves the low-order 16 bits of the svex instruction in the 
CTR; the PowerPC architecture does not save them. 

¢ The settings of the MSR bits by the system call exception differ between the 
POWER architecture and the PowerPC architecture. 


B.11 XER Register 


Bits 16-23 of the XER are reserved in the PowerPC architecture, whereas in the POWER 
architecture they are defined to contain the comparison byte for the Iscbx instruction, which 
is not included in the PowerPC architecture. 


B.12 Update Forms of Memory Access 


The PowerPC architecture requires that rA not be equal to either rD (integer load only) or 
zero. If the restriction is violated, the instruction form is invalid. See Section 4.1.3, “Classes 
of Instructions,” for information about invalid instructions. The POWER architecture 
permits these cases and simply avoids saving the EA. 
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B.13 Multiple Register Loads 


When executing instructions that load multiple registers, the PowerPC architecture requires 
that rA, and rB if present in the instruction format, not be in the range of registers to be 
loaded, while the POWER architecture permits this and does not alter rA or rB in this case. 
(The PowerPC architecture restriction applies even if rA = 0, although there is no obvious 
benefit to the restriction in this case since rA is not used to compute the effective address 
if rA = 0.) If the PowerPC architecture restriction is violated, either the system illegal 
instruction error handler is invoked or the results are boundedly undefined. 


The instructions affected are listed as follows: 
¢ Imw (Im in the POWER architecture) 
¢  Iswi (Isi in the POWER architecture) 
¢ Iswx (Isx in the POWER architecture) 


For example, an Imw instruction that loads all 32 registers is valid in the POWER 
architecture but is an invalid form in the PowerPC architecture. 


B.14 Alignment for Load/Store Multiple 


When executing load/store multiple instructions, the PowerPC architecture requires the EA 
to be word-aligned and yields an alignment exception or boundedly-undefined results if it 
is not. The POWER architecture specifies that an alignment exception occurs (if AL = 1). 


B.15 Load and Store String Instructions 


In the PowerPC architecture, an Iswx instruction with zero length leaves the content of rD 
undefined (if rD # rA and rD ¥ rB) or is an invalid instruction form (if rD = rA or 

rD = rB), while in the POWER architecture the corresponding instruction (Isx) is a no-op 
in these cases. 


Note also that, in the PowerPC architecture, an Iswx instruction with zero length may alter 
the referenced bit, and an stswx instruction with zero length may alter the referenced and 
changed bits, while in the POWER architecture the corresponding instructions (Isx and 
stsx) do not alter the referenced and changed bits. 


B.16 Synchronization 


The sync instruction (called des in the POWER architecture) and the isyne instruction 
(called the ics in the POWER architecture) cause a much more pervasive synchronization 
in the PowerPC architecture than in the POWER architecture. For more information, refer 
to Chapter 8, “Instruction Set.” 
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B.17 Move to/from SPR 


Differences in how the Move to/from Special Purpose Register (mtspr and mfspr) 
instructions function are as follows: 


¢ The SPR field is 10 bits long in the PowerPC architecture, but only 5 bits in POWER 
architecture. 


¢ The mfspr instruction can be used to read the decrementer (DEC) register in 
problem state (user mode) in the POWER architecture, but only in supervisor state 
in the PowerPC architecture. 


e Ifthe SPR value specified in the instruction is not one of the defined values, the 
POWER architecture behaves as follows: 


— If the instruction is executed in user-level privilege state and SPR[O] = 1, a 
supervisor-level instruction type program exception occurs. No architected 
registers are altered except those set by the exception. 

— If the instruction is executed in supervisor-level privilege state and SPR[O] = 0, 
no architected registers are altered. 

In this same case, the PowerPC architecture behaves as follows: 


— If the instruction is executed in user-level privilege state and SPR[O] = 1, either 
an illegal instruction type program exception or a supervisor-level instruction 
type program exception occurs. No architected registers are altered except those 
set by the exception. 


— Otherwise, (the instruction is executed in supervisor-level privilege state or 
SPR[O] = 0), either an illegal instruction type program exception occurs (in 
which case no architected registers are altered except those set by the exception) 
or the results are boundedly undefined. 


B.18 Effects of Exceptions on FPSCR Bits FR and Fl 


For the following cases, the POWER architecture does not specify how the FR and FI bits 
are set, while the PowerPC architecture preserves them for illegal operation exceptions 
caused by compare instructions and clears them otherwise. 


¢ Invalid operation exception (enabled or disabled) 
¢ Zero divide exception (enabled or disabled) 


¢ Disabled overflow exception 


B-6 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


B.19 Floating-Point Store Single Instructions 


There are several respects in which the PowerPC architecture is incompatible with the 
POWER architecture when executing store floating-point single instructions. 


The POWER architecture uses FPSCR[UE] to help determine whether denormalization 
should be done, while the PowerPC architecture does not. Note that in the PowerPC 
architecture, if FPSCR[UE] = 1 and a denormalized single-precision number is copied from 
one memory location to another by means of an Ifs instruction followed by an stfs 
instruction, the two “copies” may not be the same. Refer to Section 3.3.6.2.2, “Underflow 
Exception Condition,” for more information about underflow exceptions. 


For an operand having an exponent that is less than 874 (an unbiased exponent less than - 
149), the POWER architecture specifies storage of a zero (if FPSCR[UE] = 0), while the 
PowerPC architecture specifies the storage of an undefined value. 


B.20 Move from FPSCR 


The POWER architecture defines the high-order 32 bits of the result of mffs to be 
OxFFFF_FFFF. In the PowerPC architecture they are undefined. 


B.21 Clearing Bytes in the Data Cache 
The delz instruction of the POWER architecture and the dcbz instruction of the PowerPC 
architecture have the same opcode. However, the functions differ in the following respects. 
¢ The delz instruction clears a line; dcbz clears a block. 
¢ The delz instruction saves the EA inrA (if rA #0); dcbz does not. 


¢ The delz instruction is supervisor-level; dcebz is not. 


B.22 Segment Register Instructions 


The definitions of the four segment register instructions (mtsr, mtsrin, mfsr, and mfsrin) 
differ in two respects between the POWER architecture and the PowerPC architecture. 
Instructions similar to mtsrin and mfsrin are called mtsri and mfsri in the POWER 
architecture. The definitions follow: 


¢ Privilege—mfsr and mfsri are problem state instructions in the POWER 
architecture, while mfsr and mfsrin are supervisor-level in the PowerPC 
architecture. 


¢ Function—the indirect instructions (mtsri and mfsri) in the POWER architecture 
use an rA register in computing the segment register number, and the computed EA 
is stored into rA (Gif rA #0 and rA #rD); in the PowerPC architecture mtsrin and 
mfsrin have no rA field and EA is not stored. 
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The mtsr, mtsrin (mtsri), and mfsr instructions have the same opcodes in the PowerPC 
architecture as in the POWER architecture. The mfsri instruction in the POWER 
architecture and the mfsrin instruction in PowerPC architecture have different opcodes. 


B.23 TLB Entry Invalidation 


The tlbi instruction in the POWER architecture and the tlbie instruction in the PowerPC 
architecture have the same opcode. However, the functions differ in the following respects. 


¢ The tlbi instruction computes the EA as (rAl0) + rB, while tlbie lacks an rA field 
and computes the EA as rB. 

¢ The tlbi instruction saves the EA inrA (if rA # 0); tlbie lacks an rA field and does 
not save the EA. 


B.24 Floating-Point Exceptions 


Both the PowerPC and the POWER architectures use bit 20 of the MSR to control the 
generation of exceptions for floating-point enabled exceptions. However, in the PowerPC 
architecture this bit is part of a 2-bit value which controls the occurrence, precision, and 
recoverability of the exception, whereas, in the POWER architecture this bit is used 
independently to control the occurrence of the exception (in the POWER architecture all 
floating-point exceptions are precise). 


B.25 Timing Facilities 


This section describes differences between the POWER architecture and the PowerPC 
architecture timer facilities. 


B.25.1 Real-Time Clock 
The POWER real-time clock (RTC) is not supported in the PowerPC architecture. Instead, 


the PowerPC architecture provides a time base register (TB). Both the RTC and the TB are 
64-bit special-purpose registers, but they differ in the following respects: 


¢ The RTC counts seconds and nanoseconds, while the TB counts ticks. The 
frequency of the TB is implementation-dependent. 


¢ The RTC increments discontinuously—1 is added to RTCU when the value in RTCL 
passes 999_999_ 999. The TB increments continuously—1 is added to TBU when 
the value in TBL passes Ox FFFF_FFFF. 


¢ The RTC is written and read by the mtspr and mfspr instructions, using SPR 
numbers that denote the RTCU and RTCD. The TB is written by the mtspr 
instruction (using new SPR numbers) and read by the new mftb instruction. 


¢ The SPR numbers that denote POWER architectures’s RTCL and RTCU are invalid 
in the PowerPC architecture. 
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¢ The RTC is guaranteed to increment at least once in the time required to execute ten 
Add Immediate (addi) instructions. No analogous guarantee is made for the TB. 


¢ Not all bits of RTCL need be implemented, while all bits of the TB must be 
implemented. 


B.25.2 Decrementer 


The decrementer (DEC) register differs, in the PowerPC and POWER architectures, in the 
following respects: 


¢ The PowerPC architecture DEC register decrements at the same rate that the TB 
increments, while the POWER decrementer decrements every nanosecond (which is 
the same rate that the RTC increments). 


¢ Notall bits of the POWER DEC need be implemented, while all bits of the PowerPC 
DEC must be implemented. 


e The exception caused by the DEC has its own exception vector location in the 
PowerPC architecture, but is considered an external exception in the POWER 
architecture. 


B.26 Deleted Instructions 


The following instructions, shown in Table B-2, are part of the POWER architecture but 
have been dropped from the PowerPC architecture. 


Table B-2. Deleted POWER Instructions 


Opcode Opcode 
fi Cache Line Invalidate 31 
fact [Oancatwinesow Siw 
a 
a 
a 
faci [Biteorcewrzeoimmeawe [|| 
fects [Load singard Comma Beraoes [ot | amr _| 
fmasts[wewceee TY 
[asst [Meskheetion fewer Ym 
[nisin [Move fomSearentRessirares [ofa 
a 
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Table B-2. Deleted POWER Instructions (Continued) 


Opcode Opcode 


Note: Many of these instructions use the MQ register. The MQ is not defined in the 
PowerPC architecture. 
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B.27 POWER Instructions Supported by the PowerPC 
Architecture 
Table B-3 lists the POWER instructions implemented in the PowerPC architecture. 


Table B-3. POWER Instructions Implemented in PowerPC Architecture 


ra an ee 
Cs 
[si __[ Ration Roms 
[ea [ AND tover 
[ea [ AND aed Uw 
[est [Sorpusaciesioner 
[ean [Corpus aes Uner 
[eae | Paacadetnesarwzao | 


Instruction Cache Synchronize isyne 


Load Word and Zero 
Load Word Byte-Reverse Indexed 
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Load Byte-Reverse Indexed 





ae (eee! 


Table B-3. POWER Instructions Implemented in PowerPC Architecture (Continued) 


Load Multiple Load Multiple Word 

Load String Immediate Load String Word Immediate 
Load String Indexed Load String Word Indexed 

Load with Update Load Word and Zero with Update 


Load with Update Indexed Load Word and Zero with Update 
Indexed 


rlimix Rotate Left Immediate then Mask rlwimix Rotate Left Word Immediate then Mask 
Insert Insert 

rlinmx Rotate Left Immediate then AND With | rlwinmx Rotate Left Word Immediate then AND 
Mask with Mask 


sfex 
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Table B-3. POWER Instructions Implemented in PowerPC Architecture (Continued) 


POWER 
Store with Update Indexed Store Word with Update Indexed 
Store Indexed Store Word Indexed 
Supervisor Call System Call 


Trap Immediate Trap Word Immediate * 


TLB Invalidate Entry Translation Lookaside Buffer Invalidate 
Entry 


XOR Immediate Lower XOR Immediate 
XOR Immediate Upper XOR Immediate Shifted 


* Supervisor-level instruction 





Appendix B. POWER Architecture Cross Reference B-13 


Appendix C 
Multiple-Precision Shifts 


This appendix gives examples of how multiple precision shifts can be programmed. A 
multiple-precision shift is initially defined to be a shift of an n-word quantity , where n > 
1. The quantity to be shifted is contained in n registers. The shift amount is specified either 
by an immediate value in the instruction or by bits 27-31 of a register. 


The examples shown below distinguish between the cases n = 2 and n > 2. However if n > 
2, the shift amount must be in the range 0-31 for the examples to yield the desired result. 
The specific instance shown for n > 2 is n = 3: extending those instruction sequences to 
larger n is straightforward, as is reducing them to the case n = 2 when the more stringent 
restriction on shift amount is met. For shifts with immediate shift amounts, only the case n 
= 3 is shown because the more stringent restriction on shift amount is always met. 


In the examples it is assumed that GPRs 2 and 3 (and 4) contain the quantity to be shifted, 
and that the result is to be placed into the same registers. For non-immediate shifts, the shift 
amount is assumed to be in bits 27-31 of GPR6. For immediate shifts, the shift amount is 
assumed to be greater than zero. GPRs 0-31 are used as scratch registers. For n > 2, the 
number of instructions required is 2n — 1 (immediate shifts) or 3n — 1 (non-immediate 
shifts). 


The following sections provide examples of multiple-precision shifts. 
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C.1 Multiple-Precision Shifts in 32-Bit 
Implementations 


Shift Left Immediate, n = 3 (Shift Amount < 32) 
rlwinm r2,r2,sh,0,31 - sh 
rlwimi 1r2,r3,sh,32 - sh,31 
rlwinm r3,4r3,sh,0,31 - sh 
rlwimi 1r3,r4,sh,32 - sh,31 
rlwinm 1r4,4r4,sh,0,31 - sh 


Shift Left, n = 2 (Shift Amount < 64) 
subfic 1r31,r6,32 


slw r2,r2,xr6 
srw r0,r3,r31 
or r2,r2,xr0 
addi r31,r6,-32 
slw r0,r3,r31 
or r2,r2,xr0 
slw r3,r3,r6 


Shift Left, n = 3 (Shift Amount < 32) 
subfic 1r31,r6,32 


slw r2,r2,xr6 
srw r0,r3,r31 
or r2,r2,xr0 
slw r3,r3,r6 
srw r0,r4,r31 
or r3,r3,r0 
slw r4,r4,r6 


Shift Right Immediate, 7 = 3 (Shift Amount < 32) 
rlwinm 1r4,r4,32 - sh,sh,31 

rlwimi 1r4,r3,32 - sh,0,sh - 1 

rlwinm 1r3,4r3,32 - sh,sh,31 

rlwimi r3,4r2,32 - sh,0,sh - 1 

rlwinm 1£r2,4r2,32 - sh,sh,31 


Shift Right, 1 = 2 (Shift Amount < 64) 
subfic 1r31,r6,32 


srw r3,r3,r6 
slw r0,r2,r31 
or r3,xr3,r0 
addi r31,r6, —32 
srw r0,r2,r31 
or r3,xr3,r0 
srw r2,r2,xr6 


Shift Right, 2 = 3 (Shift Amount < 32) 
subfic 4r31,r6,-32 


srw r4,r4,r6 
slw r0,r3,r31 
or r4,r4,xr0 
srw r3,r3,r6 
slw r0,r2,r31 
or r3,xr3,r0 
srw r2,r2,xr6 
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Shift Right Algebraic Immediate, n = 3 (Shift Amount < 32) 


rlwinm 
rlwimi 
rlwinm 
rlwimi 
srawi 


r4,xr4,32 
r4,x3,32 
3,53, 32 
r3,4r2,32 
r2,r2,sh 


sh, sh, 31 
sh,0,sh - 1 
sh,sh;,31 
sh,0,sh - 1 


Shift Right Algebraic, n = 2 (Shift Amount < 64) 


subfic 
srw 
slw 
or 
addic. 
sraw 
ble 
ori 
sraw 


r31,r6,32 
r3,r3,r6 
r0,r2,r31 
r3,r3,r0 
r31,r6,-32 
r0,r2,r31 


$+8 
r3,r0,0 
r2,r2,xr6 


Shift Right Algebraic, n = 3 (Shift Amount < 32) 


subfic 
srw 
slw 
or 

srw 
slw 
or 
sraw 


r31,r6,32 


r4,r4,r6 


r0,r3,xr31 


r4,r4,xr0 
r3,xr3,xr6 


r0,r2,r31 


r3,xr3,xr0 
r2,r2,xr6 
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Appendix D 
Floating-Point Models 


This appendix describes the execution model for IEEE operations and gives examples of 
how the floating-point conversion instructions can be used to perform various conversions 
as well as providing models for floating-point instructions. 


D.1 Execution Model for IEEE Operations 


The following description uses double-precision arithmetic as an example; single-precision 
arithmetic is similar except that the fraction field is a 23-bit field and the single-precision 
guard, round, and sticky bits (described in this section) are logically adjacent to the 23-bit 
FRACTION field. 


IEEE-conforming significand arithmetic is performed with a floating-point accumulator 
where bits 0-55, shown in Figure D-1, comprise the significand of the intermediate result. 


S]C/}L FRACTION G| R}X 
0 1 52 55 


Figure D-1. IEEE 64-Bit Execution Model 





The bits and fields for the IEEE double-precision execution model are defined as follows: 
¢ The S bit is the sign bit. 
¢ The C bit is the carry bit that captures the carry out of the significand. 


¢ The L bit is the leading unit bit of the significand that receives the implicit bit from 
the operands. 


¢ The FRACTION is a 52-bit field that accepts the fraction of the operands. 


¢ The guard (G), round (R), and sticky (X) bits are extensions to the low-order bits of 
the accumulator. The G and R bits are required for postnormalization of the result. 
The G, R, and X bits are required during rounding to determine if the intermediate 
result is equally near the two nearest representable values. The X bit serves as an 
extension to the G and R bits by representing the logical OR of all bits that may 
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appear to the low-order side of the R bit, due to either shifting the accumulator right 
or to other generation of low-order result bits. The G and R bits participate in the left 
shifts with zeros being shifted into the R bit. 


Table D-1 shows the significance of the G, R, and X bits with respect to the intermediate 
result (IR), the next lower in magnitude representable number (NL), and the next higher in 
magnitude representable number (NH). 


Table D-1. Interpretation of G, R, and X Bits 


fer] x) Interpretation 
NS a 





The significand of the intermediate result is made up of the L bit, the FRACTION, and the 
G, R, and X bits. 


The infinitely precise intermediate result of an operation is the result normalized in bits L, 
FRACTION, G, R, and X of the floating-point accumulator. 


After normalization, the intermediate result is rounded, using the rounding mode specified 
by FPSCR[RN]. If rounding causes a carry into C, the significand is shifted right one 
position and the exponent is incremented by one. This causes an inexact result and possibly 
exponent overflow. Fraction bits to the left of the bit position used for rounding are stored 
into the FPR, and low-order bit positions, if any, are set to zero. 


Four user-selectable rounding modes are provided through FPSCR[RN] as described in 
Section 3.3.5, “Rounding.” For rounding, the conceptual guard, round, and sticky bits are 
defined in terms of accumulator bits. 


Table D-2 shows the positions of the guard, round, and sticky bits for double-precision and 
single-precision floating-point numbers in the IEEE execution model. 
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Table D-2. Location of the Guard, Round, and Sticky Bits—IEEE Execution Model 


[rows [ eo [oe [Sy 


Ec Ce Cc 


Rounding can be treated as though the significand were shifted right, if required, until the 
least-significant bit to be retained is in the low-order bit position of the FRACTION. If any 
of the guard, round, or sticky bits are nonzero, the result is inexact. 





Z1 and Z2, defined in Section 3.3.5, “Rounding,” can be used to approximate the result in 
the target format when one of the following rules is used: 


¢ Round to nearest 


— Guard bit = 0: The result is truncated. (Result exact (GRX = 000) or closest to 
next lower value in magnitude (GRX = 001, 010, or 011). 


— Guard bit = 1: Depends on round and sticky bits: 


Case a: If the round or sticky bit is one (inclusive), the result is incremented 
(result closest to next higher value in magnitude (GRX = 101, 110, or 111)). 


Case b: If the round and sticky bits are zero (result midway between closest 
representable values) then if the low-order bit of the result is one, the result is 
incremented. Otherwise (the low-order bit of the result is zero) the result is 
truncated (this is the case of a tie rounded to even). 


If during the round-to-nearest process, truncation of the unrounded number 
produces the maximum magnitude for the specified precision, the following action 
is taken: 

— Guard bit = 1: Store infinity with the sign of the unrounded result. 

— Guard bit = 0: Store the truncated (maximum magnitude) value. 


¢ Round toward zero—Choose the smaller in magnitude of Z1 or Z2. If the guard, 
round, or sticky bit is nonzero, the result is inexact. 


¢ Round toward +infinity—Choose Z1. 

¢ Round toward —infinity—Choose Z2. 
Where the result is to have fewer than 53 bits of precision because the instruction is a 
floating round to single-precision or single-precision arithmetic instruction, the 


intermediate result either is normalized or is placed in correct denormalized form before 
being rounded. 
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D.2 Execution Model for Multiply-Add Type 
Instructions 


The PowerPC architecture makes use of a special instruction form that performs up to three 
operations in one instruction (a multiply, an add, and a negate). With this added capability 
comes the special ability to produce a more exact intermediate result as an input to the 
rounder. Single-precision arithmetic is similar except that the fraction field is smaller. Note 
that the rounding occurs only after add; therefore, the computation of the sum and product 
together are infinitely precise before the final result is rounded to a representable format. 


The multiply-add significand arithmetic is considered to be performed with a floating-point 
accumulator, where bits 1-106 comprise the significand of the intermediate result. The 
format is shown in Figure D-2. 


S]C}L FRACTION x 
0 1 105 





Figure D-2. Multiply-Add 64-Bit Execution Model 


The first part of the operation is a multiply. The multiply has two 53-bit significands as 
inputs, which are assumed to be prenormalized, and produces a result conforming to the 
above model. If there is a carry out of the significand (into the C bit), the significand is 
shifted right one position, placing the L bit into the most-significant bit of the FRACTION 
and placing the C bit into the L bit. All 106 bits (L bit plus the fraction) of the product take 
part in the add operation. If the exponents of the two inputs to the adder are not equal, the 
significand of the operand with the smaller exponent is aligned (shifted) to the right by an 
amount added to that exponent to make it equal to the other input’s exponent. Zeros are 
shifted into the left of the significand as it is aligned and bits shifted out of bit 105 of the 
significand are ORed into the X' bit. The add operation also produces a result conforming 
to the above model with the X' bit taking part in the add operation. 


The result of the add is then normalized, with all bits of the add result, except the X' bit, 
participating in the shift. The normalized result serves as the intermediate result that is input 
to the rounder. 


For rounding, the conceptual guard, round, and sticky bits are defined in terms of 
accumulator bits. Table D-3 shows the positions of the guard, round, and sticky bits for 
double-precision and single-precision floating-point numbers in the multiply-add execution 
model. 


Table D-3. Location of the Guard, Round, and Sticky Bits—Multiply-Add Execution 
Model 


[rena [i [ome [sy 
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The rules for rounding the intermediate result are the same as those given in Section D.1, 
“Execution Model for IEEE Operations.” 


If the instruction is floating negative multiply-add or floating negative multiply-subtract, 
the final result is negated. 


Floating-point multiply-add instructions combine a multiply and an add operation without 
an intermediate rounding operation. The fraction part of the intermediate product is 106 bits 
wide, and all 106 bits take part in the add/subtract portion of the instruction. 


Status bits are set as follows: 


* Overflow, underflow, and inexact exception bits, the FR and FI bits, and the FPRF 
field are set based on the final result of the operation, and not on the result of the 
multiplication. 


¢ Invalid operation exception bits are set as if the multiplication and the addition were 
performed using two separate instructions (for example, an fmul instruction 
followed by an fadd instruction). That is, multiplication of infinity by 0 or of 
anything by an SNaN, causes the corresponding exception bits to be set. 


D.3 Floating-Point Conversions 


This section provides examples of floating-point conversion instructions. Note that some of 
the examples use the optional Floating Select (fsel) instruction. Care must be taken in using 
fsel if IEEE compatibility is required, or if the values being tested can be NaNs or infinities. 


D.3.1 Conversion from Floating-Point Number to Signed Fixed-Point 
Integer Word 


The full convert to signed fixed-point integer word function can be implemented with the 
following sequence, assuming that the floating-point value to be converted is in FPR1, the 
result is returned in GPR3, and a double word at displacement (disp) from the address in 
GPRI can be used as scratch space. 


fctiw[z]f£2,f1 #convert to fx int 
stfd £2,disp(r1) #store float 
lwa r3,disp + 4(r1) #load word algebraic 


#(use lwz on a 32-bit implementation) 
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D.3.2 Conversion from Floating-Point Number to Unsigned Fixed- 
Point Integer Word 


In a 32-bit implementation, the full convert to unsigned fixed-point integer word function 
can be implemented with the sequence shown below, assuming that the floating-point value 
to be converted is in FPR1, the value zero is in FPRO, the value 232_ 1 isin FPR3, the value 
23! is in FPR4, the result is returned in GPR3, and a double word at displacement (disp) 
from the address in GPR1 can be used as scratch space. 





fsel £2,£1,£1,£0 use 0 if < 0 

fsub £5,£3,f1 use max if > max 
fsel £2,£5, £2, £3 

fsub £5,£2,£4 subtract 2**31 
fcmpu er2,£2,£4 #use diff if = 2**31 
fsel £2,£5,£5,£2 

fctiw[z]f£2,£2 convert to fx int 
stfd £2, cisp(el) #store float 

lwz r3,disp + 4(r1) #load word 

blt er2,326 add 2**31 if input 
xoris r3,xr3,0x8000 was 2 2**31 





D.4 Floating-Point Models 


This section describes models for floating-point instructions. 


D.4.1 Floating-Point Round to Single-Precision Model 


The following algorithm describes the operation of the Floating Round to Single-Precision 
(frsp) instruction. 


If frB[1-11] < 897 and frB[1—63] > 0 then 
Do 
If FPSCR[UE] = 0 then goto Disabled Exponent Underflow 
If FPSCR[UE] = | then goto Enabled Exponent Underflow 
End 


If frB[1-11] > 1150 and frB[1—11] < 2047 then 
Do 
If FPSCR[OE] = 0 then goto Disabled Exponent Overflow 
If FPSCR[OE] = | then goto Enabled Exponent Overflow 
End 


If frB[ 1-11] > 896 and frB[1—-11] < 1151 then goto Normal Operand 
If frB[ 1-63] = 0 then goto Zero Operand 


If frB[ 1-11] = 2047 then 
Do 
If frB[ 12-63] = 0 then goto Infinity Operand 
If frB[12] = 1 then goto QNaN Operand 
If frB[12] = 0 and frB[13—63] > 0 then goto SNaN Operand 
End 


Disabled Exponent Underflow: 


sign < frB[0] 
Tf frB[1-11] = 0 then 
Do 
exp < —1022 
frac[0-52] <— 0b0 II frB[12-63] 
End 
If frB[1-11] > 0 then 
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Do 
exp < frB[1—11] — 1023 
frac[0-52] < 0b1 II frB[12-63] 
End 
Denormalize operand: 
GII R Il X — 0b000 
Do while exp < —126 
exp < exp+ 1 
frac[0- 50] IG IR IX & Ob II frac Il G Il (R |X) 
End 
FPSCR[UX] < frac[24—-52] II GIR Il X >0 
Round single(sign,exp,frac[0O—52],G,R,X) 
FPSCR[XX] < FPSCR[XX] | FPSCR[FI] 
Tf frac[O-—52] = 0 then 
Do 
frD[0] < sign 
frD[1-63] < 0 
If sign = 0 then FPSCR[FPRF] < “+zero” 
If sign = 1 then FPSCR[FPRF] < “—zero” 
End 
If frac[{O—52] > 0 then 
Do 
If frac[O] = 1 then 
Do 
If sign = 0 then FPSCR[FPRF] < “+normal number” 
If sign = 1 then FPSCR[FPRF] < “—normal number” 
End 
If frac[O] = 0 then 
Do 
If sign = 0 then FPSCR[FPRF] < “+denormalized number” 
If sign = 1 then FPSCR[FPRF] < “—denormalized number” 
End 
Normalize ee 4 
Do while frac[0] = 
exp < exp-— 1 
frac[0- 52} < frac[1—52] Il ObO 
End 
frD[0] < sign 
frD[1-11] < exp + 1023 
frD[12-63] < frac[1-52] 
End 
Done 


Enabled Exponent Underflow 


FPSCR[UX] < 1 
sign < frB[0] 
If frB[1-11] = 0 then 
Do 
exp < —1022 
frac[0-52] < 0b0 Il frB[12—63] 
End 
If frB[1-11] > 0 then 
Do 
exp < frB[1-11] — 1023 
frac[(0-52] < 0b1 II frB[12-63] 
End 


Normalize operand: 
Do while frac[0] = 0 
exp < exp -— l 
frac[O—52] < frac[1—52] || ObO 
End 
Round single(sign,exp,frac[0—52],0,0,0) 
FPSCR[XX] < FPSCR[XX] | FPSCR[FI] 
exp < exp + 192 
frD[0] < sign 
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frD[1-11] < exp + 1023 

frD[12-63] < frac[1-52] 

If sign = 0 then FPSCR[FPRF] < “+normal number” 
If sign = 1 then FPSCR[FPRF] < “—normal number” 
Done 


Disabled Exponent Overflow 


FPSCR[OX] < 1 
If FPSCR[RN] = 0b00 then /* Round to Nearest */ 
Do 
If frB[0] = 0 then frD < 0x7FFO_0000_0000_0000 
If frB[0] = 1 then frD < 0xFFFO_0000_0000_0000 
If frB[0] = 0 then FPSCR[FPRF] < “+infinity” 
If frB[0] = 1 then FPSCR[FPRF] < “infinity” 
End 
If FPSCR[RN] = 0b01 then /* Round Truncate */ 
Do 
If frB[0] = 0 then frD <— 0x47EF_FFFF_E000_0000 
If frB[0] = 1 then frD < 0xC7EF_FFFF_E000_0000 
If frB[0] = 0 then FRSCR[FPRF] < “+normal number” 
If frB[O] = 1 then FRSCR[FPRF] < “—normal number” 
End 
If FPSCR[RN] = 0b10 then /* Round to +Infinity */ 
Do 
If frB[0] = 0 then frD < 0x7FFO_0000_0000_0000 
Tf frB[0] = 1 then frD < 0xC7EF_FFFF_E000_0000 
If frB[0] = 0 then FPSCR[FPRF] < “+infinity” 
If frB[O] = 1 then FRSCR[FPRF] < “—normal number” 
End 
If FPSCR[RN] = 0b11 then /* Round to -Infinity */ 
Do 
If frB[0] = 0 then frD < 0x47EF_FFFF_E000_0000 
If frB[0] = 1 then frD < 0xFFFO_0000_0000_0000 
If frB[O] = 0 then FRSCR[FPRF] < “+normal number” 
If frB[0] = 1 then FPSCR[FPRF] < “infinity” 
End 
FPSCR[FR] < undefined 
FPSCR[FI]] < 1 
FPSCR[XX] < 1 
Done 








Enabled Exponent Overflow 


sign < frB[0] 
exp < frB[1-11] — 1023 
frac[(0-52] < 0b1 II frB[12-63] 
Round single(sign,exp,frac[0O—52],0,0,0) 
FPSCR[XX] < FPSCR[XX] | FPSCR[FI] 
Enabled Overflow 
FPSCR[OX] < 1 
exp < exp — 192 
frD[0] < sign 
frD[1-11] < exp + 1023 
frD[12-63] < frac[1-52] 
If sign = 0 then FPSCR[FPRF] < “+normal number” 
If sign = 1 then FPSCR[FPRF] < “—normal number” 
Done 


Zero Operand 


frD ¢ frB 

If frB[0] = 0 then FPSCR[FPRF] ¢ “+zero” 
If frB[0] = 1 then FPSCR[FPRF] < “-zero” 
FPSCR[FR FI] © O0b00 

Done 
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Infinity Operand 


frD ¢« frB 

If frB[0] = 0 then FPSCR[FPRF] ¢ “+infinity” 
If frB[0] = 1 then FPSCR[FPRF] ¢ ‘“-infinity” 
Done 

QNaN Operand: 


frD < frB[0—34] || 0b0_0000_0000_0000_0000_0000_0000_0000 





FPSCR[FPRF] < “QNaN” 
FPSCR[FR FI] < 0b00 
Done 


SNaN Operand 


FPSCR[VXSNAN] < 1 
If FPSCR[VE] = 0 then 
Do 
frD[0-11] < frB[0-11] 
frD[12] <1 


frD[ 13-63] < frB[13—34] || ObO_0000_0000_0000_0000_0000_0000_0000 





FPSCR[FPRF] < “QNaN” 
End 
FPSCR[FR FI] < 0b00 
Done 


Normal Operand 


sign ¢ frB[0] 

exp ¢ frB[1-11] - 1023 

frac[0-52] €<0bl1 || £rB[12-63] 

Round single(sign,exp, frac[0-52],0,0,0) 
FPSCR[XX] < FPSCR[XX] | FPSCR[FI] 


If exp > +127 and FPSCR[OE] = 0 then go to Disabled Exponent Overflow 
If exp > +127 and FPSCR[OE] = 1 then go to Enabled Overflow 


frD[0] ¢< sign 
frD[1-11] < exp + 1023 
frD[12-63] «<frac[1-52] 


If sign = 0 then FPSCR[FPRF] <“+normal number” 


If sign 1 then FPSCR[FPRF] ¢ “-normal number” 
Done 

Round Single (sign,exp,frac[0—52],G,R,X) 

inc — 0 

Isb <— frac[23] 


gbit < frac[24] 
rbit < frac[25] 
xbit ~ (frac[26—52] II G Il R Il X) #0 
If FPSCR[RN] = 0b00 then 
Do 
If sign Il Isb Il gbit Il rbit Il xbit = Obul Luu then inc < 1 
If sign II Isb II gbit II rbit Il xbit = Obu011u then inc ¢ 1 
If sign Il Isb II gbit II rbit Il xbit = ObuO1ul then inc < 1 
End 
If FPSCR[RN] = 0b10 then 
Do 
Tf sign Il Isb II gbit Il rbit Il xbit = ObOuluu then inc ~— 1 
If sign Il Isb II gbit II rbit Il xbit = ObOuu1u then inc < 1 
If sign Il Isb II gbit || rbit Il xbit = ObOuuu1 then inc < 1 
End 
If FPSCR[RN] = 0b11 then 
Do 
Tf sign Il Isb II gbit Il rbit Il xbit = Obluluu then inc < 1 
If sign Il Isb II gbit Il rbit II xbit = Obl uulu then inc < 1 
If sign Il Isb II gbit || rbit II xbit = Obl uuul then inc < 1 
End 
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frac[0—23] < frac[O—23] + inc 
If carry_out =1 then 
Do 
frac[0—23] < Ob1 II frac[O-22] 
exp < exp + 1 
End 
frac[24—52] < (29)0 
FPSCR[FR] < inc 
FPSCR[FI] < gbit | rbit | xbit 
Return 


D.4.2 Floating-Point Convert to Integer Model 


The following algorithm describes the operation of the floating-point convert to integer 
instructions. In this example, ‘u’ represents an undefined hexadecimal digit. 


If Floating Convert to Integer Word 
Then Do 
Then round_mode < FPSCR[RN] 
tgt_precision < “32-bit integer” 
End 
If Floating Convert to Integer Word with round toward Zero 
Then Do 
round_mode < 0b01 
tgt_precision < “32-bit integer” 
End 
If Floating Convert to Integer Double Word 
Then Do 
round_mode <- FPSCR[RN] 
tgt_precision < “64-bit integer” 
End 
If Floating Convert to Integer Double Word with Round toward Zero 
Then Do 
round_mode < 0b01 
tgt_precision ~ “64-bit integer” 
End 
sign < frB[0] 
If frB[1—11] = 2047 and frB[12—63] = 0 then goto Infinity Operand 
If frB[ 1-11] = 2047 and frB[12] = 0 then goto SNaN Operand 
If frB[1-11] = 2047 and frB[12] = 1 then goto QNaN Operand 
If frB[ 1-11] > 1054 then goto Large Operand 


If frB[ 1-11] > 0 then exp < frB[1-11] — 1023 /* exp — bias */ 

If frB[1—11] = 0 then exp < —1022 

If frB[1—11] > 0 then frac[0-64]< Ob01 II frB[12—63] Il (11)0 /*normal*/ 
If frB[1—-11] = 0 then frac[0-64]< 0b00 II frB[12-63] || (11)0 /*denormal*/ 





gbit II rbit Il xbit ~ Ob000 

Do i= 1,63 — exp /*do the loop 0 times if exp = 63*/ 
frac[O—64] II gbit II rbit Il xbit — ObO II frac[O—64] II gbit II (rbit | xbit) 

End 


Round Integer (sign,frac[0—64],gbit,rbit,xbit,round_mode) 


In this example, ‘u’ represents an undefined hexadecimal digit. Comparisons ignore the u 
bits. 


63 


If sign = 1 then frac[0-64] < —frac[0-64] + 1 /* needed leading 0 for -2% < frB <-2%/ 


If tgt_precision = “32-bit integer” and frac[0-64] > “apt -1 


then goto Large Operand 
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If tgt_precision = “64-bit integer” and frac[0-64] > 4263 -1 


then goto Large Operand 
If tgt_precision = “32-bit integer” and frac[0-64] < 231 


FPSCR[XX] < FPSCR[XX] | FPSCR[FI] 


then goto Large Operand 


If tgt_precision = “64-bit integer” and frac[0-64] < 263 


If tgt_precision = “32-bit integer” 
then frD <— Oxxuuu_uuutu II frac[33-64] 
If tgt_precision = “64-bit integer” then frD < frac[ 1-64] 
FPSCR[FPRF] < undefined 
Done 


then goto Large Operand 


Round Integer(sign,frac[0-64],gbit,rbit,xbit,round_mode) 


In this example, ‘u’ represents an undefined hexadecimal digit. Comparisons ignore the u 
bits. 


inc + 0 
If round_mode = 0b00 then 
Do 
If sign Il frac[64] II gbit Il rbit Il xbit = Obu1 luu then inc — 1 
If sign Il frac[64] II gbit Il rbit Il xbit = Obu011u then inc — 1 
If sign Il frac[64] II gbit Il rbit Il xbit = ObuO1ul then inc — 1 
End 
If round_mode = 0b10 then 
Do 
Tf sign Il frac[64] II gbit Il rbit Il xbit = ObOuluu then inc —1 
If sign Il frac[64] II gbit Il rbit Il xbit = ObOuu1u then inc — 1 
If sign Il frac[64] II gbit Il rbit Il xbit = ObOuuu! then inc — 1 
End 
If round_mode = 0b11 then 
Do 
Tf sign Il frac[64] II gbit Il rbit Il xbit = Obluluu then inc — 1 
If sign Il frac[64] II gbit II rbit II xbit = Obluulu then inc ¢ 1 
If sign Il frac[64] II gbit II rbit II xbit = Obluuul then inc ¢ 1 
End 
frac[0-64] < frac[0-64] + inc 
FPSCR[FR] < inc 
FPSCR[FI] < gbit | rbit | xbit 
Return 


Infinity Operand 


FPSCR[FR FI VXCVI] < 0b001 
If FPSCR[VE] = 0 then Do 
If tgt_precision = “32-bit integer” then 
Do 
If sign = 0 then frD — Oxuuuu_uuuu_7FFF_FFFF 
If sign = | then frD <— Oxuuuu_uuuu_8000_0000 
End 
Else 
Do 
If sign = 0 then frD <— 0x7FFF_FFFF_FFFF_FFFF 
If sign = | then frD — 0x8000_0000_0000_0000 
End 
FPSCR[FPRF] < undefined 
End 
Done 


SNaN Operand 


FPSCR[FR FI VXCVI VXSNAN] < 0b0011 
If FPSCR[VE] = 0 then 
Do 
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If tgt_precision = “32-bit integer” 
then frD < Oxuuuu_uuuu_8000_0000 
If tgt_precision = “64-bit integer” 
then frD <— 0x8000_0000_0000_0000 
FPSCR[FPRF] < undefined 
End 
Done 


QNaN Operand 


FPSCR[FR FI VXCVI] < 0b001 
If FPSCR[VE] = 0 then 
Do 
If tgt_precision = “32-bit integer” then frD <— Oxuuuu_uuuu_8000_0000 
If tgt_precision = “64-bit integer” then frD — 0x8000_0000_0000_0000 
FPSCR[FPRF] < undefined 
End 
Done 


Large Operand 


FPSCR[FR FI VXCVI] < 0b001 
If FPSCR[VE] = 0 then Do 
If tgt_precision = “32-bit integer” then 
Do 
If sign = 0 then frD — Oxuuuu_uuuu_7FFF_FFFF 
If sign = | then frD — Oxuuuu_uuuu_8000_0000 
End 
Else 
Do 
If sign = 0 then frD <— 0x7FFF_FFFF_FFFF_FFFF 
Tf sign = | then frD — 0x8000_0000_0000_0000 
End 
FPSCR[FPRF] < undefined 
End 
Done 


D.4.3 Floating-Point Convert from Integer Model 


The following describes, algorithmically, the operation of the floating-point convert from 
integer instructions. 


sign < frB[0] 
exp < 63 
frac[0-63] < frB 


If frac[0O-63] = 0 then go to Zero Operand 
If sign = | then frac[0-63] < —frac[0-63] + 1 


Do while frac[0] = 0 
frac[O—63] < frac[1—63] || '0' 
exp < exp -— 1 

End 


Round Float(sign,exp,frac[0-—63],FPSCR[RN]) 


If sign = 1 then FPSCR[FPRF] < “—normal number” 
If sign = 0 then FPSCR[FPRF] < “+normal number” 
frD[0] < sign 

frD[1-11] < exp + 1023 

frD[12-63] < frac[1-52] 

Done 
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Zero Operand 


FPSCR[FR FI] < 0b00 
FPSCR[FPRF] < “+zero” 

frD <— 0x0000_0000_0000_0000 
Done 


Round Float(sign,exp,frac[0—63],round_mode) 
In this example ‘u’ represents an undefined hexadecimal digit. Comparisons ignore the u 
bits. 


inc — 0 

Isb < frac[52] 

gbit < frac[53] 

rbit < frac[54] 

xbit < frac[55—63] > 0 

If round_mode = 0b00 then 
Do 


If sign Il Isb II gbit II rbit Il xbit = Obul luu then inc ¢ 1 
If sign Il Isb II gbit Il rbit Il xbit = Obu011u then inc < 1 
If sign Il Isb II gbit II rbit Il xbit = ObuO1ul then inc < 1 
End 
If round_mode = 0b10 then 
Do 
If sign Il Isb II gbit Il rbit II xbit = ObOuluu then inc ~ 1 
Tf sign Il Isb II gbit Il rbit Il xbit = ObOuulu then inc — 1 
If sign Il Isb II gbit Il rbit Il xbit = ObOuuu1 then inc — 1 
End 
If round_mode = 0b11 then 
Do 
If sign Il Isb II gbit || rbit II xbit = Obluluu then inc < 1 
If sign Il Isb II gbit Il rbit II xbit = Obl uulu then inc < 1 
If sign Il Isb II gbit || rbit Il xbit = Obluuul then inc < 1 
End 
frac[O-52] < frac[O—52] + inc 
If carry_out = 1 then exp < exp + 1 
FPSCR[FR] < inc 
FPSCRI[FI] < gbit | rbit | xbit 
FPSCR[XX] < FPSCR[XX] | FPSCR[FI] 
Return 











D.5 Floating-Point Selection 


The following are examples of how the optional fsel instruction can be used to implement 
floating-point minimum and maximum functions, and certain simple forms of if-then-else 
constructions, without branching. 


The examples show program fragments in an imaginary, C-like, high-level programming 
language, and the corresponding program fragment using fsel and other PowerPC 
instructions. In the examples, a, b, x, y, and z are floating-point variables, which are 
assumed to be in FPRs fa, fb, fx, fy, and fz. FPR fs is assumed to be available for scratch 


space. 
Additional examples can be found in Section D.3, “Floating-Point Conversions.” 


Note that care must be taken in using fsel if IEEE compatibility is required, or if the values 
being tested can be NaNs or infinities; see Section D.5.4, “Notes.” 
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D.5.1 Comparison to Zero 


This section provides examples in a program fragment code sequence for the comparison 
to zero case. 


High-level language: PowerPC: 
if a> 0.0 then x <— y fsel fx, fa, fy, fz (see Section D.5.4, “Notes” number 1) 

else x <—z 
if a> 0.0 then x <— y fneg fs, fa 

else x < z fsel fx, fs, fz, fy (see Section D.5.4, “Notes” numbers | and 2) 
ifa=0.0 thenx <~ y fsel_ fx, fa, fy, fz 

else x <— z fneg fs, fa 

fsel fx, fs, fx, fz (see Section D.5.4, “Notes” number 1) 





D.5.2 Minimum and Maximum 


This section provides examples in a program fragment code sequence for the minimum and 
maximum cases. 


High-level language: PowerPC: 


x © min(a, b) fsub fs, fa, fb (see Section D.5.4, “Notes” numbers 3, 4, and 5) 
fsel fx, fs, fb, fa 


x < max(a, b) fsub_ fs, fa, fb (see Section D.5.4, “Notes” numbers 3, 4, and 5) 
fsel fx, fs, fa, fb 
D.5.3 Simple If-Then-Else Constructions 


This section provides examples in a program fragment code sequence for simple if-then- 
else statements. 


High-level language: PowerPC: 
ifa>bthenx <~y fsub fs, fa, fb 

else x < z fsel fx, fs, fy, fz (see Section D.5.4, “Notes” numbers 4 and 5) 
if a>b then x <— y fsub fs, fb, fa 

else x < Zz fsel fx, fs, fz, fy (see Section D.5.4, “Notes” numbers 3, 4, and 5) 
ifa=bthenx<-y fsub fs, fa, fb 

else x <z fsel fx, fs, fy, fz 

fneg fs, fs 


fsel fx, fs, fx, fz (see Section D.5.4, “Notes” numbers 4 and 5) 


D.5.4 Notes 


The following notes apply to the examples found in Section D.5.1, “Comparison to Zero,” 
Section D.5.2, “Minimum and Maximum,” and Section D.5.3, “Simple If-Then-Else 
Constructions,” and to the corresponding cases using the other three arithmetic relations (<, 
<, and 4). These notes should also be considered when any other use of fsel is contemplated. 
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In these notes the “optimized program” is the PowerPC program shown, and the 
“unoptimized program” (not shown) is the corresponding PowerPC program that uses 
fempu and branch conditional instructions instead of fsel. 


1. The unoptimized program affects the VXSNAN bit of the FPSCR, and therefore 
may cause the system error handler to be invoked if the corresponding exception is 
enabled, while the optimized program does not affect this bit. This property of the 
optimized program is incompatible with the IEEE standard. (Note that the 
architecture specification also refers to exceptions as interrupts.) 


2. The optimized program gives the incorrect result if ‘a’ is a NaN. 


3. The optimized program gives the incorrect result if ‘a’ and/or ‘b’ is a NaN (except 
that it may give the correct result in some cases for the minimum and maximum 
functions, depending on how those functions are defined to operate on NaNs). 


4. The optimized program gives the incorrect result if ‘a’ and ‘b’ are infinities of the 
same sign. (Here it is assumed that invalid operation exceptions are disabled, in 
which case the result of the subtraction is a NaN. The analysis is more complicated 
if invalid operation exceptions are enabled, because in that case the target register of 
the subtraction is unchanged.) 


5. The optimized program affects the OX, UX, XX, and VXISI bits of the FPSCR, and 
therefore may cause the system error handler to be invoked if the corresponding 
exceptions are enabled, while the unoptimized program does not affect these bits. 
This property of the optimized program is incompatible with the IEEE standard. 


D.6 Floating-Point Load Instructions 


There are two basic forms of load instruction—single-precision and double-precision. 
Because the FPRs support only floating-point double format, single-precision load floating- 
point instructions convert single-precision data to double-precision format prior to loading 
the operands into the target FPR. The conversion and loading steps follow: 


Let WORD[0-31] be the floating point single-precision operand accessed from memory. 


Normalized Operand 
If WORD[1-8] > 0 and WORD[1-8] < 255 
frD[0-1] <— WORD[0-1] 
frD[2] < 7 WORD[1] 
frD[3] < 7 WORD[1] 
frD[4] <— 7 WORD[1] 
frD[5-63] <— WORD[2-31] || (29)0 


Denormalized Operand 

If WORD[1-8] = 0 and WORD[9-31] ¥ 0 
sign €<— WORD[0] 
exp < -126 


frac[0-52] < ObO || WORD[9-31] || (29)0 
normalize the operand 
Do while frac[0] = 0 
frac < frac[1-52] || 0b0 
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exp < exp - 1 
End 
frD[0] < sign 
frD[1-11] < exp + 1023 
frD[12-63] < frac[1-52] 


Infinity / QNaN / SNaN / Zero 
If WORD[1-8] = 255 or WORD[1-31] = 0 
frD[0-1] <— WORD[0-1] 
frD[2] <— WORD[1] 
frD[3] <— WORD[1] 
frD[4] < WORD[1] 
frD[5-63] <— WORD[2-31] || (29)0 
For double-precision floating-point load instructions, no conversion is required as the data 
from memory is copied directly into the FPRs. 


Many floating-point load instructions have an update form in which register rA is updated 
with the EA. For these forms, if operand rA + 0, the effective address (EA) is placed into 
register rA and the memory element (word or double word) addressed by the EA is loaded 
into the floating-point register specified by operand frD; if operand rA = 0, the instruction 
form is invalid. 


Recall that rA, rB, and rD denote GPRs, while frA, frB, frC, frS, and frD denote FPRs. 


D.7 Floating-Point Store Instructions 


There are three basic forms of store instruction—single-precision, double-precision, and 
integer. The integer form is provided by the optional stfiwx instruction. Because the FPRs 
support only floating-point double format for floating-point data, single-precision store 
floating-point instructions convert double-precision data to single-precision format prior to 
storing the operands into memory. The conversion steps follow: 


Let WORD[0-31] be the word written to in memory. 


No Denormalization Required (includes Zero/Infinity/NaN) 


if frS[1-11] > 896 or £rS[1-63] = 0 then 
WORD[0-1] < frS[0-1] 
WORD [2-31] €< f£rS[5-34] 


Denormalization Required 


if 874 < f£rS[1-11] $< 896 then 
sign < frS[0] 
exp < frS[1-11] - 1023 
frac. <> Ub1 "| |) es (12-63) 
Denormalize operand 
Do while exp < -126 
frac <— 0b0 [| frac [0-62] 
exp < exp + 1 
End 
WORD[0] <— sign 
WORD[1-8] <— 0x00 
WORD[9-31] <— frac[1-23] 
else WORD <—- undefined 
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Notice that if the value to be stored by a single-precision store floating-point instruction is 
larger in magnitude than the maximum number representable in single format, the first case 
mentioned, “No Denormalization Required,” applies. The result stored in WORD is then a 
well-defined value, but is not numerically equal to the value in the source register (that is, 
the result of a single-precision load floating-point from WORD will not compare equal to 
the contents of the original source register). 


Note that the description of conversion steps presented here is only a model. The actual 
implementation may vary from this description but must produce results equivalent to what 
this model would produce. 


It is important to note that for double-precision store floating-point instructions and for the 
store floating-point as integer word instruction no conversion is required as the data from 
the FPR is copied directly into memory. 
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Appendix E 
Synchronization Programming 
Examples 


The examples in this appendix show how synchronization instructions can be used to 
emulate various synchronization primitives and how to provide more complex forms of 
synchronization. 


For each of these examples, it is assumed that a similar sequence of instructions is used by 
all processes requiring synchronization of the accessed data. 


E.1 General Information 


The following points provide general information about the lwarx and stwex. instructions: 


In general, lwarx and stwex. instructions should be paired, with the same effective 
address (EA) used for both. The only exception is that an unpaired stwex. instruction 
to any (scratch) effective address can be used to clear any reservation held by the 
processor. 


It is acceptable to execute an lwarx instruction for which no stwex. instruction is 
executed. Such a dangling Iwarx instruction occurs in the example shown in 
Section E.2.5, “Test and Set,” if the value loaded is not zero. 


To increase the likelihood that forward progress is made, it is important that looping 
on Iwarx/stwex. pairs be minimized. For example, in the sequence shown in 
Section E.2.5, “Test and Set,” this is achieved by testing the old value before 
attempting the store—were the order reversed, more stwex. instructions might be 
executed, and reservations might more often be lost between the Iwarx and the 
stwex. instructions. 


The manner in which Iwarx and stwex. are communicated to other processors and 
mechanisms, and between levels of the memory subsystem within a given processor, 
is implementation-dependent. In some implementations, performance may be 
improved by minimizing looping on an Iwarx instruction that fails to return a 
desired value. For example, in the example provided in Section E.2.5, “Test and 
Set,” if the program stays in the loop until the word loaded is zero, the programmer 
can change the “bne- $+12” to “bne- loop.” 
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In some implementations, better performance may be obtained by using an ordinary 
load instruction to do the initial checking of the value, as follows: 


loop: lwz r5,0(r3) #load the word 
cmpwi r5,0 #loop back if word 
bne- loop #not equal to 0 
lwarx r5,0,r3 #try again, reserving 
cmpwi r5,0 # (likely to succeed) 
bne loop #try to store nonzero 
stwex. 1r4,0,r3 # 
bne- loop #loop if lost reservation 


¢ Ina multiprocessor, livelock (a state in which processors interact in a way such that 
no processor makes progress) is possible if a loop containing an Ilwarx/stwex. pair 
also contains an ordinary store instruction for which any byte of the affected 
memory area is in the reservation granule of the reservation. For example, the first 
code sequence shown in Section E.5, “List Insertion,” can cause livelock if two list 
elements have next element pointers in the same reservation granule. 


E.2 Synchronization Primitives 


The following examples show how the lwarx and stwex. instructions can be used to 
emulate various synchronization primitives. The sequences used to emulate the various 
primitives consist primarily of a loop using the Iwarx and stwex. instructions. Additional 
synchronization is unnecessary, because the stwex. will fail, clearing the EQ bit, if the word 
loaded by Iwarx has changed before the stwex. is executed. 


E.2.1 Fetch and No-Op 


The fetch and no-op primitive atomically loads the current value in a word in memory. In 
this example, it is assumed that the address of the word to be loaded is in GPR3 and the data 
loaded are returned in GPR4. 


loop: lwarx r4,0,r3 #load and reserve 
stwex. 44,0,r3 #store old value if still reserved 
bne- loop #loop if lost reservation 


The stwex., if it succeeds, stores to the destination location the same value that was loaded 
by the preceding Iwarx. While the store is redundant with respect to the value in the 
location, its success ensures that the value loaded by the lwarx was the current value (that 
is, the source of the value loaded by the Iwarx was the last store to the location that 
preceded the stwex. in the coherence order for the location). 
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E.2.2 Fetch and Store 


The fetch and store primitive atomically loads and replaces a word in memory. 


In this example, it is assumed that the address of the word to be loaded and replaced is in 
GPR3, the new value is in GPR4, and the old value is returned in GPRS. 


loop: lwarx r5,0,r3 #load and reserve 
stwex. 44,0,r3 #store new value if still reserved 
bne- loop #loop if lost reservation 


E.2.3 Fetch and Add 


The fetch and add primitive atomically increments a word in memory. 


In this example, it is assumed that the address of the word to be incremented is in GPR3, 
the increment is in GPR4, and the old value is returned in GPRS. 


loop: lwarx r5,0,xr3 #load and reserve 
add r0,r4,xr5 #increment word 
stwex. 1r0,0,xr3 #store new value if still reserved 
bne- loop #loop if lost reservation 


E.2.4 Fetch and AND 


The fetch and AND primitive atomically ANDs a value into a word in memory. 


In this example, it is assumed that the address of the word to be ANDed is in GPR3, the 
value to AND into it is in GPR4, and the old value is returned in GPRS. 


loop: lwarx r5,0,xr3 #load and reserve 
and r0,r4,r5 #AND word 
stwex. 14r0,0,xr3 #store new value if still reserved 
bne- loop #loop if lost reservation 


This sequence can be changed to perform another Boolean operation atomically on a word 
in memory, simply by changing the AND instruction to the desired Boolean instruction 
(OR, XOR, etc.). 


E.2.5 Test and Set 


This version of the test and set primitive atomically loads a word from memory, ensures that 
the word in memory is a nonzero value, and sets CRO[EQ] according to whether the value 
loaded is zero. 


In this example, it is assumed that the address of the word to be tested is in GPR3, the new 
value (nonzero) is in GPR4, and the old value is returned in GPRS. 


loop: lwarx r5,0,r3 #load and reserve 
cmpwi bo) #done if word 
bne $+12 #not equal to 0 
stwex. 4r4,0,r3 #try to store non-zero 
bne- loop #loop if lost reservation 
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E.3 


Compare and Swap 


The compare and swap primitive atomically compares a value in a register with a word in 
memory. If they are equal, it stores the value from a second register into the word in 
memory. If they are unequal, it loads the word from memory into the first register, and sets 
the EQ bit of the CRO field to indicate the result of the comparison. 


In this example, it is assumed that the address of the word to be tested is in GPR3, the word 


that is 
GPR4. 


loop: 


exit: 
Notes: 
1. 


2: 


E-4 


compared is in GPR4, the new value is in GPRS, and the old value is returned in 


lwarx r6,0,r3 #load and reserve 


cmpw r4,xr6 #first 2 operands equal ? 

bne- exit #skip if not 

stwex. 4r5,0,r3 #store new value if still reserved 
bne- loop #loop if lost reservation 

mr r4,xr6 #return value from memory 


The semantics in this example are based on the IBM System/370™ compare and 
swap instruction. Other architectures may define this instruction differently. 


Compare and swap is shown primarily for pedagogical reasons. It is useful on 
machines that lack the better synchronization facilities provided by the lwarx and 
stwex. instructions. Although the instruction is atomic, it checks only for whether 
the current value matches the old value. An error can occur if the value had been 
changed and restored before being tested. 


In some applications, the second bne- instruction and/or the mr instruction can be 
omitted. The first bne- is needed only if the application requires that if the EQ bit of 
CRO field on exit indicates not equal, then the original compared value in r4 and r6 
are in fact not equal. The mr is needed only if the application requires that if the 
compared values are not equal, then the word from memory is loaded into the 
register with which it was compared (rather than into a third register). If either, or 
both, of these instructions is omitted, the resulting compare and swap does not obey 
the IBM System/370 semantics. 
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E.4 Lock Acquisition and Release 


This example provides an algorithm for locking that demonstrates the use of 
synchronization with an atomic read/modify/write operation. GPR3 provides a shared 
memory location, the address of which is an argument of the lock and unlock procedures. 
This argument is used as a lock to control access to some shared resource such as a data 
structure. The lock is open when its value is zero and locked when it is one. Before 
accessing the shared resource, a processor sets the lock by having the lock procedure call 
TEST_AND_SET, which executes the code sequence in Section E.2.5, “Test and Set.” This 
atomically sets the old value of the lock, and writes the new value (1) given to it in GPR4, 
returning the old value in GPRS (not used in the following example) and setting the EQ bit 
in CRO according to whether the value loaded is zero. The lock procedure repeats the test 
and set procedure until it successfully changes the value in the lock from zero to one. 


The processor must not access the shared resource until it sets the lock. After the bne- 
instruction that checks for the successful test and set operation, the processor executes the 
isync instruction. This delays all subsequent instructions until all previous instructions have 
completed to the extent required by context synchronization. The sync instruction could be 
used but performance would be degraded because the syne instruction waits for all 
outstanding memory accesses to complete with respect to other processors. This is not 
necessary here. 


lock: li r4,1 #obtain lock 
loop: bl test_and_set #test and set 
bne- loop #retry until old = 0 


#delay subsequent instructions until 
#previous ones complete 

isync 

blr #return 


The unlock procedure writes a zero to the lock location. If the access to the shared resource 
includes write operations, most applications that use locking require the processor to 
execute a sync instruction to make its modification visible to all processors before releasing 
the lock. For this reason, the unlock procedure in the following example begins with a sync. 


unlock: syne #delay until prior stores finish 
li r1,0 
stw r1,0(r3) #store zero to lock location 
blr #return 


Appendix E. Synchronization Programming Examples E-5 


E.5 List Insertion 


The following example shows how the lwarx and stwex. instructions can be used to 
implement simple LIFO (last-in-first-out) insertion into a singly-linked list. (Complicated 
list insertion, in which multiple values must be changed atomically, or in which the correct 
order of insertion depends on the contents of the elements, cannot be implemented in the 
manner shown below, and requires a more complicated strategy such as using locks.) 


The next element pointer from the list element after which the new element is to be inserted, 
here called the parent element, is stored into the new element, so that the new element 
points to the next element in the list—this store is performed unconditionally. Then the 
address of the new element is conditionally stored into the parent element, thereby adding 
the new element to the list. 


In this example, it is assumed that the address of the parent element is in GPR3, the address 
of the new element is in GPR4, and the next element pointer is at offset zero from the start 
of the element. It is also assumed that the next element pointer of each list element is in a 
reservation granule separate from that of the next element pointer of all other list elements. 


loop: lwarx r2,0,r3 #get next pointer 
stw r2,0(r4)#store in new element 
sync #let store settle (can omit if not MP) 
stwex. 44,0,r3 #add new element to list 
bne- loop #loop if stwcx. failed 


In the preceding example, if two list elements have next element pointers in the same 
reservation granule in a multiprocessor system, livelock can occur. 


If it is not possible to allocate list elements such that each element’s next element pointer 
is in a different reservation granule, livelock can be avoided by using the following 
sequence: 


lwz r2,0(r3)#get next pointer 

loopl: mr r5,xr2 #keep a copy 
stw r2,0(r4)#store in new element 
sync #let store settle 

loop2: lwarx r2,0,r3 #get it again 
cmpw r2,xr5 #loop if changed (someone 
bne- loopl #else progressed) 
stwex. 4r4,0,r3 #add new element to list 
bne- loop2 #loop if failed 
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Appendix F 
Simplified Mnemonics 


This appendix is provided in order to simplify the writing and comprehension of assembler 
language programs. Included are a set of simplified mnemonics and symbols that define the 
simple shorthand used for the most frequently-used forms of branch conditional, compare, 
trap, rotate and shift, and certain other instructions. (Note that the architecture specification 
refers to simplified mnemonics as extended mnemonics.) 


F.1 Symbols 


The symbols in Table F-1 are defined for use in instructions (basic or simplified 
mnemonics) that specify a condition register (CR) field or a bit in the CR. 


Table F-1. Condition Register Bit and Identification Symbol Descriptions 


Bit Field we 
aia 


Less than. Identifies a bit number within a CR field. 
Greater than. Identifies a bit number within a CR field. 


Equal. Identifies a bit number within a CR field. 


Summary overflow. Identifies a bit number within a CR field. 


| 3 [- | Unordered (after floating-point comparison). Identifies a bit number in a CR field. 


Note: To identify a CR bit, an expression in which a CR field symbol is multiplied by 4 and then added to a bit-number- 
within-CR-field symbol can be used. 








Appendix F. Simplified Mnemonics F-1 


Note that the simplified mnemonics in Section F.5.2, “Basic Branch Mnemonics,” and 
Section F.6, “Simplified Mnemonics for Condition Register Logical Instructions,” require 
identification of a CR bit—if one of the CR field symbols is used, it must be multiplied by 
4 and added to a bit-number-within-CR-field (value in the range of 0-3, explicit or 
symbolic). The simplified mnemonics in Section F.5.3, “Branch Mnemonics Incorporating 
Conditions,” and Section F3, “Simplified Mnemonics for Compare Instructions,” require 
identification of a CR field—if one of the CR field symbols is used, it must not be multiplied 
by 4. (For the simplified mnemonics in Section F.5.3, “Branch Mnemonics Incorporating 
Conditions,” the bit number within the CR field is part of the simplified mnemonic. The CR 
field is identified, and the assembler does the multiplication and addition required to 
produce a CR bit number for the BI field of the underlying basic mnemonic.) 


F.2 Simplified Mnemonics for Subtract Instructions 


This section discusses simplified mnemonics for the subtract instructions. 


F.2.1 Subtract Immediate 


Although there is no subtract immediate instruction, its effect can be achieved by using an 
add immediate instruction with the immediate operand negated. Simplified mnemonics are 
provided that include this negation, making the intent of the computation more clear. 


subi rD,rA,value (equivalent to addirD,rA,—value) 

subis rD,rA,value (equivalent to addis rD,rA,—value) 
subic rD,rA,value (equivalent to addic rD,rA,—value) 
subic. rD,rA,value (equivalent to addic. rD,rA,—value) 


F.2.2 Subtract 


The subtract from instructions subtract the second operand (rA) from the third (rB). 
Simplified mnemonics are provided that use the more normal order in which the third 
operand is subtracted from the second. Both these mnemonics can be coded with an o suffix 
and/or dot (.) suffix to cause the OE and/or Rc bit to be set in the underlying instruction. 


sub rD,rA,rB (equivalent to subf rD,rB,rA) 
subc rD,rA,rB (equivalent to subfe rD,rB,rA) 
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F.3 Simplified Mnemonics for Compare Instructions 


The crfD field can be omitted if the result of the comparison is to be placed into the CRO 
field. Otherwise, the target CR field must be specified as the first operand. One of the CR 
field symbols defined in Section F.1, “Symbols,” can be used for this operand. 


Note that the basic compare mnemonics of PowerPC are the same as those of POWER, but 
the POWER instructions have three operands while the PowerPC instructions have four. 
The assembler recognizes a basic compare mnemonic with the three operands as the 
POWER form, and generates the instruction with L = 0. The erfD field can normally be 
omitted when the CRO field is the target. 


F.3.1 Word Comparisons 


The instructions listed in Table F-2 are simplified mnemonics that should be supported by 
assemblers for all PowerPC implementations. 


Table F-2. Simplified Mnemonics for Word Compare Instructions 


Simplified Mnemonic Equivalent to: 
Compare Word Immediate cmpwi crfD,rA,SIMM cmpi crfD,0,rA,SIMM 


Compare Word cmpw crfD,rA,rB cmp crfD,0,rA,rB 
Compare Logical Word Immediate cmplwi erfD,rA,UIMM cmpli crfD,0,rA,UIMM 
Compare Logical Word cmplw crfD,rA,rB cmpl crfD,0,rA,rB 





Following are examples using the word compare mnemonics. 


1. Compare rA with immediate value 100 as signed 32-bit integers and place result in 


CRO. 

cmpwi rA,100 (equivalent to cmpi 0,0,rA,100) 
2. Same as (1), but place results in CR4. 

cmpwi cr4,rA,100 (equivalent to cmpi 4,0,rA,100) 


3. Compare rA and rBas unsigned 32-bit integers and place result in CRO. 
cmplw rA,rB (equivalent to cmpl 0,0,rA,rB) 


Appendix F. Simplified Mnemonics F-3 


F.4 Simplified Mnemonics for Rotate and Shift 


Instructions 


The rotate and shift instructions provide powerful and general ways to manipulate register 
contents, but can be difficult to understand. Simplified mnemonics that allow some of the 
simpler operations to be coded easily are provided for the following types of operations: 


Extract—Select a field of n bits starting at bit position b in the source register; left 
or right justify this field in the target register; clear all other bits of the target register. 


Insert—Select a left-justified or right-justified field of n bits in the source register; 
insert this field starting at bit position b of the target register; leave other bits of the 
target register unchanged. (No simplified mnemonic is provided for insertion of a 
left-justified field, when operating on double words, because such an insertion 
requires more than one instruction.) 


Rotate—Rotate the contents of a register right or left n bits without masking. 


Shift—Shift the contents of a register right or left n bits, clearing vacated bits 
(logical shift). 

Clear—Clear the leftmost or rightmost n bits of a register. 

Clear left and shift left—Clear the leftmost b bits of a register, then shift the register 
left by n bits. This operation can be used to scale a (known non-negative) array index 
by the width of an element. 


F.4.1 Operations on Words 


The operations shown in Table F-3 are available in all implementations. All these 
mnemonics can be coded with a dot (.) suffix to cause the Rc bit to be set in the underlying 
instruction. 
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Table F-3. Word Rotate and Shift Instructions 


Insert from right immediate insrwi rA,rS,n,b (n > 0) rlwimi rA,rS,32 — (b + n),b,(b +n) —1 


Rotate left immediate rotlwi rA,rS,n rlwinm rA,rS,n,0,31 


Clear left immediate clrlwi rA,rS,n (n < 32) 
Clear left and shift left immediate clrislwi rA,rS,b,n (n < b< 31) 





Examples using word mnemonics follow: 


1. Extract the sign bit (bit 0) of rS and place the result right-justified into rA. 


extrwi rA,rS,1,0 (equivalent to rlwinm rA,rS,1,31,31) 
2. Insert the bit extracted in (1) into the sign bit (bit 0) of rB. 

insrwi rB,rA,1,0 (equivalent to —s rlwimi rB,rA,31,0,0) 
3. Shift the contents of rA left 8 bits. 

slwi rA,rA,8 (equivalent to —s rlwinm rA,rA,8,0,23) 
4. Clear the high-order 16 bits of rS and place the result into rA. 

clrlwi rA,rS,16 (equivalent to = rlwinm rA,rS,0,16,31) 
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F.5 Simplified Mnemonics for Branch Instructions 


Mnemonics are provided so that branch conditional instructions can be coded with the 
condition as part of the instruction mnemonic rather than as a numeric operand. Some of 
these are shown as examples with the branch instructions. 


The mnemonics discussed in this section are variations of the branch conditional 
instructions. 


F.5.1 BO and BI Fields 


The 5-bit BO field in branch conditional instructions encodes the following operations. 


¢ Decrement count register (CTR) 

¢ Test CTR equal to zero 

¢ Test CTR not equal to zero 

¢ Test condition true 

¢ Test condition false 

¢ Branch prediction (taken, fall through) 


The 5-bit BI field in branch conditional instructions specifies which of the 32 bits in the CR 
represents the condition to test. 


To provide a simplified mnemonic for every possible combination of BO and BI fields 
would require 2!°— 1024 mnemonics and most of these would be only marginally useful. 
The abbreviated set found in Section F.5.2, “Basic Branch Mnemonics,” is intended to 
cover the most useful cases. Unusual cases can be coded using a basic branch conditional 
mnemonic (be, belr, bectr) with the condition to be tested specified as a numeric operand. 


F.5.2 Basic Branch Mnemonics 


The mnemonics in Table F-4 allow all the common BO operand encodings to be specified 
as part of the mnemonic, along with the absolute address (AA), and set link register (LR) 
bits. 


Notice that there are no simplified mnemonics for relative and absolute unconditional 
branches. For these, the basic mnemonics b, ba, bl, and bla are used. 


Table F-4 provides the abbreviated set of simplified mnemonics for the most commonly 
performed conditional branches. 
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Table F-4. Simplified Branch Mnemonics 


LR Update Not Enabled LR Update Enabled 


Biotin Souantics be bea belr bectr bel bela belrl becirl 
Relative | Absolute | toLR to CTR | Relative | Absolute | toLR to CTR 
Branch unconditionally SE 


| Branch if condition true_| if condition true 


eee if condition 
false 

Decrement CTR, 
branch if CTR non-zero 


Decrement CTR, 
branch if CTR non-zero 
AND condition true 


Decrement CTR, 
branch if CTR non-zero 
AND condition false 


Decrement CTR, 
branch if CTR zero 
Decrement CTR, 
branch if CTR zero 
AND condition true 
Decrement CTR, 
branch if CTR zero 
AND condition false 


The simplified mnemonics shown in Table F-4 that test a condition require a corresponding 
CR bit as the first operand of the instruction. The symbols defined in Section F.1, 
“Symbols,” can be used in the operand in place of a numeric value. 





The simplified mnemonics found in Table F-4 are used in the following examples: 


1. Decrement CTR and branch if it is still nonzero (closure of a loop controlled by a 


count loaded into CTR). 
bdnz target (equivalent to —_ be 16,0,target) 
2. Same as (1) but branch only if CTR is non-zero and condition in CRO is “equal.” 
bdnzt eq,target (equivalent to —_ be 8,2,target) 
3. Same as (2), but “equal” condition is in CRS. 
bdnzt 4 * cr5 + eq,target (equivalent to —_ be 8,22,target) 
4. Branch if bit 27 of CR is false. 
bf 27,target (equivalent to —_ be 4,27,target) 


5. Same as (4), but set the link register. This is a form of conditional call. 
bfl 27,target (equivalent to —_ bel 4,27, target) 


Appendix F. Simplified Mnemonics F-7 


Table F-5 provides the simplified mnemonics for the be and bea instructions without link 
register updating, and the syntax associated with these instructions. Note that the default 
condition register specified by the simplified mnemonics in the table is CRO. 


Table F-5. Simplified Branch Mnemonics for be and bca Instructions without Link 
Register Update 


LR Update Not Enabled 


Pig honmantce be Simplified bea Simplified 
Relative Mnemonic Absolute Mnemonic 


Branch unconditionally 


Branch if condition true be 12,0,target | bt 0,target bea 12,0,target | bta 0,target 
Branch if condition false be 4,0,target bf 0,target bea 4,0,target bfa 0,target 


Decrement CTR, branch if CTR nonzero | be16,0,target | bdnz target bea 16,0,target | bdnza target 


Decrement CTR, branch if CTR nonzero | be 8,0,target bdnzt 0,target bea 8,0,target bdnzta 0,target 
AND condition true 
Decrement CTR, branch if CTR nonzero | be 0,0,target bdnzf 0,target bea 0,0,target bdnzfa 0,target 
AND condition false 


Decrement CTR, branch if CTR zero be18,0,target | bdz target bea 18,0,target | bdza target 


Decrement CTR, branch if CTR zero be10,0,target | bdzt 0,target bea 10,0,target | bdzta 0,target 
AND condition true 
Decrement CTR, branch if CTR zero be 2,0,target bdzf 0,target bea 2,0,target bdzfa 0,target 
AND condition false 
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Table F-6 provides the simplified mnemonics for the belr and beelr instructions without 
link register updating, and the syntax associated with these instructions. Note that the 
default condition register specified by the simplified mnemonics in the table is CRO. 


Table F-6. Simplified Branch Mnemonics for bclr and bcclr Instructions without 
Link Register Update 


LR Update Not Enabled 
Branch Semantics belr Simplified Simplified 
; bectr to CTR : 
to LR Mnemonic Mnemonic 


Branch unconditionally belr 20,0 bectr 20,0 


Branch if condition true belr 12,0 bectr 12,0 
Branch if condition false belr 4,0 bectr 4,0 


Decrement CTR, branch if CTR belr 16,0 
nonzero 

Decrement CTR, branch if CTR belr 10,0 
nonzero AND condition true 

Decrement CTR, branch if CTR 

nonzero AND condition false 

Decrement CTR, branch if CTR belr 18,0 
zero 

Decrement CTR, branch if CTR belr 10,0 
zero AND condition true 

Decrement CTR, branch if CTR 

zero AND condition false 
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Table F-7 provides the simplified mnemonics for the bel and bela instructions with link 
register updating, and the syntax associated with these instructions. Note that the default 
condition register specified by the simplified mnemonics in the table is CRO. 


Table F-7. Simplified Branch Mnemonics for bcl and bcla Instructions with Link 
Register Update 


LR Update Enabled 
Branch Semantics fae cotta eh hae! 
bel Relative elmplltied bcla Absolute Sineid 
Mnemonic Mnemonic 


Branch if condition true bel1 2,0,target btl 0,target bela 12,0,target bila 0,target 
Branch if condition false bel 4,0,target bfl 0,target bela 4,0,target bfla 0,target 


Decrement CTR, branch ifCTR | bel 16,0,target bdnzl target bela 16,0,target bdnzla target 
nonzero 


Decrement CTR, branch ifCTR | bel 0,0,target bdnzfl 0,target bela 0,0,target bdnzfla 0,target 
nonzero AND condition false 

Decrement CTR, branch if CTR | bel 18,0,target bdzl target bela 18,0,target bdzla target 
zero 

Decrement CTR, branch ifCTR | bel 10,0,target bdzitl 0,target bela 10,0,target bdztla 0,target 
zero AND condition true 

Decrement CTR, branch if CTR | bel 2,0,target bdzfl 0,target bela 2,0,target bdzfla 0,target 
zero AND condition false 


Decrement CTR, branch if CTR | bel 8,0,target bdnztl 0,target bela 8,0,target bdnztla 0,target 
nonzero AND condition true 
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Table F-8 provides the simplified mnemonics for the belrl and bectrl instructions with link 
register updating, and the syntax associated with these instructions. Note that the default 
condition register specified by the simplified mnemonics in the table is CRO. 


Table F-8. Simplified Branch Mnemonics for bclirl and bcctrl Instructions with Link 
Register Update 


LR Update Enabled 
Bianenoeanics belrl Simplified bectrl Simplified 
to LR Mnemonic to CTR Mnemonic 


Branch unconditionally belrl 20,0 bectrl 20,0 


Branch if condition true belrl12,0 bectrl 12,0 
Branch if condition false belrl 4,0 bectrl 4,0 


Decrement CTR, branch if CTR belrl 16,0 
nonzero 


Decrement CTR, branch if CTR belirl 8,0 
nonzero AND condition true 


Decrement CTR, branch if CTR 
nonzero AND condition false 


Decrement CTR, branch if CTR zero belrl 18,0 





Decrement CTR, branch if CTR zero 
AND condition true 


Decrement CTR, branch if CTR zero belrl 4,0 
AND condition false 
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F.5.3 Branch Mnemonics Incorporating Conditions 


The mnemonics defined in Table F-4 are variations of the branch if condition true and 
branch if condition false BO encodings, with the most useful values of BI represented in 
the mnemonic rather than specified as a numeric operand. 


A standard set of codes (shown in Table F-9) has been adopted for the most common 
combinations of branch conditions. 


Table F-9. Standard Coding for Branch Conditions 


fe [lester 
fe fews 
fee [Sermon 


fs [Simaryowty 
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Table F-10 shows the simplified branch mnemonics incorporating conditions. 
Table F-10. Simplified Branch Mnemonics with Comparison Conditions 
LR Update Not Enabled LR Update Enabled 


Blane semancs be belr bectr bel bcla belrl bectrl 
Relative pane to LR to CTR | Relative | Absolute toLR to CTR 


[ Branch ifless than | if less than 


Ea if less than or 
equal 


| Branchitequal | if | Branchitequal | |beq | beqa | | beql | beqia | 


Branch if greater than bgea bgela 
or equal 


| Branch if greater than _| if | Branch if greater than _| than 


Branch if not greater 
than 

Branch if summary 
overflow 

Branch if not summary 
overflow 





Instructions using the mnemonics in Table F-10 specify the condition register field in an 
optional first operand. If the CR field being tested is CRO, this operand need not be 
specified. One of the CR field symbols defined in Section F.1, “Symbols,” can be used for 
this operand. 


The simplified mnemonics found in Table F-10 are used in the following examples: 


1. Branch if CRO reflects condition “not equal.” 


bne target (equivalent to _ be 4,2,target) 
2. Same as (1) but condition is in CR3. 
bne cr3,target (equivalent to be 4,14, target) 


3. Branch to an absolute target if CR4 specifies “greater than,” setting the link register. 
This is a form of conditional “call.” 


begtla cr4,target (equivalent to _ bela 12,17,target) 
4. Same as (3), but target address is in the CTR. 
bgtctrl cr4 (equivalent to bectrl 12,17) 
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Table F-11 shows the simplified branch mnemonics for the be and bea instructions without 
link register updating, and the syntax associated with these instructions. Note that the 
default condition register specified by the simplified mnemonics in the table is CRO. 


Table F-11. Simplified Branch Mnemonics for be and bca Instructions without 
Comparison Conditions and Link Register Updating 


LR Update Not Enabled 


Simplified bea Absolute Simplified 
Mnemonic Mnemonic 


ees 
Pease 
Pesta 


Branch Semantics 
bce Relative 


Branch if not summary overflow | be 4,3,target bns target bea 4,3,target bnsa target 
Branch if unordered be 12,3,target bun target bea 12,3,target buna target 





Branch if not unordered be 4,3,target bnu target bea 4,3,target bnua target 
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Table F-12 shows the simplified branch mnemonics for the belr and bectr instructions 
without link register updating, and the syntax associated with these instructions. Note that 
the default condition register specified by the simplified mnemonics in the table is CRO. 


Table F-12. Simplified Branch Mnemonics for bclr and becir Instructions without 
Comparison Conditions and Link Register Updating 


Branch Semantics 


Branch if less than 

Branch if less than or equal 
Branch if equal 

Branch if greater than or equal 
Branch if greater than 


Branch if not less than 


Branch if not equal 


Branch if not greater than 
Branch if summary overflow 
Branch if not summary overflow 
Branch if unordered 


Branch if not unordered 





LR Update Not Enabled 


Simplified bectr to CTR Simplified 
Mnemonic Mnemonic 


peariee [ofr [oew 
feersi [om [ora [ear 
Ca 
fearso [om ora0 [wear 
fearvar [ow [ores [oor 
feerso [oar ora0 [oor 
feers2 [om [ora [rear 
feersi [ova [ora [reer 
fearias [ose [ores [oar 
fearss [om [eras [rear 
fearies [ow [ores [nar 
feerss [om [oras [oreor 


belr to LR 
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Table F-13 shows the simplified branch mnemonics for the bel and bela instructions with 
link register updating, and the syntax associated with these instructions. Note that the 
default condition register specified by the simplified mnemonics in the table is CRO. 


Table F-13. Simplified Branch Mnemonics for bel and bcla Instructions with 
Comparison Conditions and Link Register Update 


LR Update Enabled 


Simplified bela Absolute Simplified 
Mnemonic Mnemonic 


Branch Semantics 
bcl Relative 


Branch if not less than bel 4,0,target bnil target bcla 4,0,target bnila target 
Branch if not equal bel 4,2,target bnel target bela 4,2,target bnela target 
Branch if not greater than bel 4,1,target bnglI target bela 4,1,target bngla target 


| Branch if summary overflow | if | Branch if summary overflow | overflow | bel 12,3,target_—_| 12,3 | bel 12,3,target_—_| | bsoltarget | | bsoltarget | | bela 12,3,target_| 12,3 | bela 12,3,target_| | bsolatarget | | bsolatarget | 


Branch if not summary bel 4,3,target bnsl target bela 4,3,target bnsla target 
overflow 


Branch if unordered bel 12,3,target bunl target bela 12,3,target bunla target 
Branch if not unordered bel 4,3,target bnul target bela 4,3,target bnula target 
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Table F-14 shows the simplified branch mnemonics for the belrl and bectl instructions with 
link register updating, and the syntax associated with these instructions. Note that the 
default condition register specified by the simplified mnemonics in the table is CRO. 


Table F-14. Simplified Branch Mnemonics for bclirl and bcctl Instructions with 
Comparison Conditions and Link Register Update 


LR Update Enabled 


Branch if greater than 
Branch if not less than 
Branch if not equal 


Branch if summary overflow 
Branch if not summary overflow 
Branch if unordered 
Branch if not unordered 


F.5.4 Branch Prediction 


In branch conditional instructions that are not always taken, the low-order bit (y bit) of the 
BO field provides a hint about whether the branch is likely to be taken. See Section 4.2.4.2, 
“Conditional Branch Control,” for more information on the y bit. 


Branch if not greater than belrl 4,1 |bngirio =| bectrl 4,1 | bngetrio | 





Assemblers should clear this bit unless otherwise directed. This default action indicates the 
following: 


¢ A branch conditional with a negative displacement field is predicted to be taken. 


¢ A branch conditional with a non-negative displacement field is predicted not to be 
taken (fall through). 


¢ A branch conditional to an address in the LR or CTR is predicted not to be taken (fall 
through). 
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If the likely outcome (branch or fall through) of a given branch conditional instruction is 
known, a suffix can be added to the mnemonic that tells the assembler how to set the y bit. 
That is, ‘+’ indicates that the branch is to be taken and ‘—’ indicates that the branch is not 
to be taken. Such a suffix can be added to any branch conditional mnemonic, either basic 
or simplified. 


For relative and absolute branches (be[I][a]), the setting of the y bit depends on whether the 
displacement field is negative or non-negative. For negative displacement fields, coding the 
suffix ‘+’ causes the bit to be cleared, and coding the suffix ‘—’ causes the bit to be set. For 
non-negative displacement fields, coding the suffix ‘+’ causes the bit to be set, and coding 
the suffix “—’ causes the bit to be cleared. 


For branches to an address in the LR or CTR (beelr[I] or bectr[I]), coding the suffix ‘+’ 
causes the y bit to be set, and coding the suffix ‘—’ causes the bit to be cleared. 
Examples of branch prediction follow: 


1. Branch if CRO reflects condition “less than,” specifying that the branch should be 
predicted to be taken. 
bit+ target 


2. Same as (1), but target address is in the LR and the branch should be predicted not 
to be taken. 
bltir— 


F.6 Simplified Mnemonics for Condition Register 
Logical Instructions 


The condition register logical instructions, shown in Table F-15, can be used to set, clear, 
copy, or invert a given condition register bit. Simplified mnemonics are provided that allow 
these operations to be coded easily. Note that the symbols defined in Section F.1, 
“Symbols,” can be used to identify the condition register bit. 


Table F-15. Condition Register Logical Mnemonics 


Simplified Mnemonic Equivalent to 
Condition register set fersetbx creqv bx,bx,bx 
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Examples using the condition register logical mnemonics follow: 


1. 


F.7 


Set CR bit 25. 

crset 25 (equivalent to creqv 25,25,25) 

Clear the SO bit of CRO. 

crclr so (equivalent to — erxor 3,3,3) 

Same as (2), but SO bit to be cleared is in CR3. 

erclr 4 * cr3 + so (equivalent to erxor 15,15,15) 

Invert the EQ bit. 

crnot eq,eq (equivalent to crnor 2,2,2) 

Same as (4), but EQ bit to be inverted is in CR4, and the result is to be placed into 


the EQ bit of CRS. 
crnot 4 * cr5 + eq, 4 * cr4 + eq (equivalent to crnor 22,18,18) 


Simplified Mnemonics for Trap Instructions 


A standard set of codes, shown in Table F-16, has been adopted for the most common 
combinations of trap conditions. 


Table F-16. Standard Codes for Trap Instructions 


a 
Hie essere = sa 8 a | SO Se 
a a a a ESE EA Es 
ese 12 Ea <~ S ee| 
foc | Greaterthanoremal | 2 fo Tt ft | oo | oo | 
[at | Greatertan Ee lt Plo fl 
ab [Noblesse = ee | a | | 
a ee (eee Se (eee ae 


[ra [Ret oer a 
yest [| pee, >|[+]e] 





Se ae eee ae 
eee ee 





yowoarnan | fe, >{+][+]o] 





Note: The eybo “<U” indicates an unsigned less hap evaluation will be performed. The symbol “>U” indi- 


cates an unsigned greater than evaluation will be performed. 
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The mnemonics defined in Table F-18 are variations of trap instructions, with the most 
useful values of TO represented in the mnemonic rather than specified as a numeric 
operand. 


Table F-18. Trap Mnemonics 


32-Bit Comparison 


a 
Practise pwn 
Fravtessanarenat waif 
a 
Practgearharorew [wate 
Fraetearan watt 
Fraetatsshy fwd 
Fractions pw 
Pravin wat 
Fraetepcayies tan pwn wh 
[rae tepcaiestanorewal wists 
[rary aur tanorewal [wit ws 
Pract goasan [wt wt 
Frac tepcayress tan wit wn 
[rar tepctyrrgoacrtan [wining 
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Examples of the uses of trap mnemonics, shown in , Table F-18follow: 


1. Trap if register rA is not zero. 


twnei rA,0 (equivalent to —_ twi 24,rA,0) 
2. Trap if register rA is not equal to rB. 

twne rA, rB (equivalent to tw 24,rA,rB) 
3. Trap if rA is logically greater than 0x7FF. 

twleti rA, Ox7FF (equivalent to — twi 1,rA, Ox7FF) 
4. Trap unconditionally. 

trap (equivalent to tw 31,0,0) 


Trap instructions evaluate a trap condition as follows: 


¢ The contents of register rA are compared with either the sign-extended SIMM field 
or the contents of register rB, depending on the trap instruction. 


The comparison results in five conditions which are ANDed with operand TO. If the result 
is not 0, the trap exception handler is invoked. (Note that exceptions are referred to as 
interrupts in the architecture specification.) See Table F-19 for these conditions. 


Table F-19. TO Operand Bit Encoding 


TO Bit ANDed with Condition 
=a Less than, using signed comparison 
Greater than, using signed comparison 


Less than, using unsigned comparison 
Greater than, using unsigned comparison 





F.8 Simplified Mnemonics for Special-Purpose 
Registers 


The mtspr and mfspr instructions specify a special-purpose register (SPR) as a numeric 
operand. Simplified mnemonics are provided that represent the SPR in the mnemonic rather 
than requiring it to be coded as a numeric operand. Table F-20 provides a list of the 
simplified mnemonics that should be provided by assemblers for SPR operations. 
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Table F-20. Simplified Mnemonics for SPRs 


Move to SPR Move from SPR 


Mnemonic quivalent to Mnemonic Equivalent to 
pee ites [mers emir 
fincas ims [mores mo miro 
fenivenir iets [ mores [ier [miro 
a 
[eae acioraser [riders [mors [rider | miprrose 
Pescenener ——_[ridecrS | miprams —_[midecvo | miprroae 
a 
[Sewandiesieegsere [mien@s [moras _[mien@® | miprroa | 

Ea 


SPRGO-SPRG3 mtspr 272 + n,rS mfsprg rD, mfspr rD,272 +n 


Address space register mtspr 280,rS | mfasrrD | mfspr rD,280 
External access register mtspr 282,rS | mfearrD | mfspr rD,282 


Save and restore register 1 mtspr 27,rS 


Processor version register fr— =s 


Following are examples using the SPR simplified mnemonics found in Table F-20: 


1. Copy the contents of rS to the XER. 





mtxer rS (equivalent to mtspr 1,rS) 
2. Copy the contents of the LR to rS. 

mfir rS (equivalent to mfspr rS,8) 
3. Copy the contents of rS to the CTR. 

mtctr rS (equivalent to mtspr 9,rS) 


F-22 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


F.9 Recommended Simplified Mnemonics 


This section describes some of the most commonly-used operations (such as no-op, load 
immediate, load address, move register, and complement register). 


F.9.1 No-Op (nop) 


Many PowerPC instructions can be coded in a way that, effectively, no operation is 
performed. An additional mnemonic is provided for the preferred form of no-op. If an 
implementation performs any type of run-time optimization related to no-ops, the preferred 
form is the no-op that triggers the following: 


nop (equivalent to ori 0,0,0) 


F.9.2 Load Immediate (li) 


The addi and addis instructions can be used to load an immediate value into a register. 
Additional mnemonics are provided to convey the idea that no addition is being performed 
but that data is being moved from the immediate operand of the instruction to a register. 


1. Load a 16-bit signed immediate value into rD. 


li rD,value (equivalent to addirD,0,value) 
2. Load a 16-bit signed immediate value, shifted left by 16 bits, into rD. 
lis rD,value (equivalent to addis rD,0,value) 


F.9.3 Load Address (la) 


This mnemonic permits computing the value of a base-displacement operand, using the 
addi instruction which normally requires a separate register and immediate operands. 


la rD,d(rA) (equivalent to addi rD,rA,d) 


The la mnemonic is useful for obtaining the address of a variable specified by name, 
allowing the assembler to supply the base register number and compute the displacement. 
If the variable v is located at offset dv bytes from the address in register rv, and the 
assembler has been told to use register rv as a base for references to the data structure 
containing V, the following line causes the address of v to be loaded into register rD: 


la rD,v (equivalent to addirD,rv,dv 


F.9.4 Move Register (mr) 


Several PowerPC instructions can be coded to copy the contents of one register to another. 
A simplified mnemonic is provided that signifies that no computation is being performed, 
but merely that data is being moved from one register to another. 


The following instruction copies the contents of rS into rA. This mnemonic can be coded 
with a dot (.) suffix to cause the Rc bit to be set in the underlying instruction. 


mr rA,rs (equivalent to or rA,rS,rS) 
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F.9.5 Complement Register (not) 


Several PowerPC instructions can be coded in a way that they complement the contents of 
one register and place the result into another register. A simplified mnemonic is provided 
that allows this operation to be coded easily. 


The following instruction complements the contents of rS and places the result into rA. 
This mnemonic can be coded with a dot (.) suffix to cause the Rc bit to be set in the 
underlying instruction. 


not rA,rS (equivalentto nor rA,rS,rS) 


F.9.6 Move to Condition Register (mtcr) 


This mnemonic permits copying the contents of a GPR to the condition register, using the 
same syntax as the mfer instruction. 


mtcr rS (equivalent to mtcerf OxFErS) 


F-24 PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


Glossary of Terms and Abbreviations 


The glossary contains an alphabetical list of terms, phrases, and abbreviations used in this 
book. Some of the terms and definitions included in the glossary are reprinted from [EEE 
Std. 754-1985, IEEE Standard for Binary Floating-Point Arithmetic, copyright ©1985 by 
the Institute of Electrical and Electronics Engineers, Inc. with the permission of the IEEE. 


Note that some terms are defined in the context of how they are used in this book. 


A Architecture. A detailed specification of requirements for a processor or 
computer system. It does not specify details of how the processor or 
computer system must be implemented; instead it provides a 
template for a family of compatible implementations. 


Asynchronous exception. Exceptions that are caused by events external to 
the processor’s execution. In this document, the term ‘asynchronous 
exception’ is used interchangeably with the word interrupt. 


Atomic access. A bus access that attempts to be part of a read-write operation 
to the same address uninterrupted by any other access to that address 
(the term refers to the fact that the transactions are indivisible). The 
PowerPC architecture implements atomic accesses through the 
Iwarx/stwex. instruction pair. 


B BAT (block address translation) mechanism. A software-controlled array 
that stores the available block address translations on-chip. 


Biased exponent. An exponent whose range of values is shifted by a constant 
(bias). Typically a bias is provided to allow a range of positive values 
to express a range that includes both positive and negative values. 


Big-endian. A byte-ordering method in memory where the address n of a 
word corresponds to the most-significant byte. In an addressed 
memory word, the bytes are ordered (left to right) 0, 1, 2, 3, with 0 
being the most-significant byte. See Little-endian. 


Block. An area of memory that ranges from 128 Kbyte to 256 Mbyte, whose 
size, translation, and protection attributes are controlled by the BAT 
mechanism. 
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Boundedly undefined. A characteristic of results of certain operations that 


Cache. 


Cache 


Cache 


Cache 


are not rigidly prescribed by the PowerPC architecture. Boundedly- 
undefined results for a given operation may vary among 
implementations, and between execution attempts in the same 
implementation. 


Although the architecture does not prescribe the exact behavior for 
when results are allowed to be boundedly undefined, the results of 
executing instructions in contexts where results are allowed to be 
boundedly undefined are constrained to ones that could have been 
achieved by executing an arbitrary sequence of defined instructions, 
in valid form, starting in the state the machine was in before 
attempting to execute the given instruction. 


High-speed memory component containing recently-accessed data 
and/or instructions (subset of main memory). 


block. A small region of contiguous memory that is copied from 
memory into a cache. The size of a cache block may vary among 
processors; the maximum block size is one page. In PowerPC 
processors, cache coherency is maintained on a cache-block basis. 
Note that the term ‘cache block’ is often used interchangeably with 
‘cache line’. 


coherency. An attribute wherein an accurate and common view of 
memory is provided to all devices that share the same memory 
system. Caches are coherent if a processor performing a read from 
its cache is supplied with data corresponding to the most recent value 
written to memory or to another processor’s cache. 


flush. An operation that removes from a cache any data from a 
specified address range. This operation ensures that any modified 
data within the specified address range is written back to main 
memory. This operation is generated typically by a Data Cache 
Block Flush (debf) instruction. 


Caching-inhibited. A memory update policy in which the cache is bypassed 


and the load or store is performed to or from main memory. 


Cast-outs. Cache blocks that must be written to memory when a cache miss 


causes a cache block to be replaced. 


Changed bit. One of two page history bits found in each page table entry 


(PTE). The processor sets the changed bit if any store is performed 
into the page. See also Page access history bits and Referenced bit. 
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Clear. To cause a bit or bit field to register a value of zero. See also Set. 


Context synchronization. An operation that ensures that all instructions in 
execution complete past the point where they can produce an 
exception, that all instructions in execution complete in the context 
in which they began execution, and that all subsequent instructions 
are fetched and executed in the new context. Context synchronization 
may result from executing specific instructions (such as isynce or rfi) 
or when certain events occur (such as an exception). 


Copy-back. An operation in which modified data in a cache block is copied 
back to memory. 


D Denormalized number. A nonzero floating-point number whose exponent 
has a reserved value, usually the format's minimum, and whose 
explicit or implicit leading significand bit is zero. 


Direct-mapped cache. A cache in which each main memory address can 
appear in only one location within the cache, operates more quickly 
when the memory request is a cache hit. 


Direct-store. Interface available on PowerPC processors only to support 
direct-store devices from the POWER architecture. When the T bit 
of a segment descriptor is set, the descriptor defines the region of 
memory that is to be used as a direct-store segment. Note that this 
facility is being phased out of the architecture and will not likely be 
supported in future devices. Therefore, software should not depend 
on it and new software should not use it. 


E Effective address (EA). The 32- or 64-bit address specified for a load, store, 
or an instruction fetch. This address is then submitted to the MMU 
for translation to either a physical memory address or an I/O address. 


Exception. A condition encountered by the processor that requires special, 
supervisor-level processing. 


Exception handler. A software routine that executes when an exception is 
taken. Normally, the exception handler corrects the condition that 
caused the exception, or performs some other meaningful task (that 
may include aborting the program that caused the exception). The 
address for each exception handler is identified by an exception 
vector offset defined by the architecture and a prefix selected via the 
MSR. 
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Extended opcode. A secondary opcode field generally located in instruction 
bits 21-30, that further defines the instruction type. All PowerPC 
instructions are one word in length. The most significant 6 bits of the 
instruction are the primary opcode, identifying the type of 
instruction. See also Primary opcode. 


Execution synchronization. A mechanism by which all instructions in 
execution are architecturally complete before beginning execution 
(appearing to begin execution) of the next instruction. Similar to 
context synchronization but doesn't force the contents of the 
instruction buffers to be deleted and refetched. 


Exponent. In the binary representation of a floating-point number, the 
exponent is the component that normally signifies the integer power 
to which the value two is raised in determining the value of the 
represented number. See also Biased exponent. 


F Fetch. Retrieving instructions from either the cache or main memory and 
placing them into the instruction queue. 


Floating-point register (FPR). Any of the 32 registers in the floating-point 
register file. These registers provide the source operands and 
destination results for floating-point instructions. Load instructions 
move data from memory to FPRs and store instructions move data 
from FPRs to memory. The FPRs are 64 bits wide and store floating- 
point values in double-precision format. 


Fraction. In the binary representation of a floating-point number, the field of 
the significand that lies to the right of its implied binary point. 


Fully-associative. Addressing scheme where every cache location (every 
byte) can have any possible address. 


G General-purpose register (GPR). Any of the 32 registers in the general- 
purpose register file. These registers provide the source operands and 
destination results for all integer data manipulation instructions. 
Integer load instructions move data from memory to GPRs and store 
instructions move data from GPRs to memory. 


Guarded. The guarded attribute pertains to out-of-order execution. When a 
page is designated as guarded, instructions and data cannot be 
accessed out-of-order. 
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H Harvard architecture. An architectural model featuring separate caches for 
instruction and data. 


Hashing. An algorithm used in the page table search process. 


I IEEE 754. A standard written by the Institute of Electrical and Electronics 
Engineers that defines operations and representations of binary 
floating-point arithmetic. 


Illegal instructions. A class of instructions that are not implemented for a 
particular PowerPC processor. These include instructions not defined 
by the PowerPC architecture. In addition, for 32-bit 
implementations, instructions that are defined only for 64-bit 
implementations are considered to be illegal instructions. For 64-bit 
implementations instructions that are defined only for 32-bit 
implementations are considered to be illegal instructions. 


Implementation. A particular processor that conforms to the PowerPC 
architecture, but may differ from other architecture-compliant 
implementations for example in design, feature set, and 
implementation of optional features. The PowerPC architecture has 
many different implementations. 


Implementation-dependent. An aspect of a feature in a processor’s design 
that is defined by a processor’s design specifications rather than by 
the PowerPC architecture. 


Implementation-specific. An aspect of a feature in a processor’s design that 
is not required by the PowerPC architecture, but for which the 
PowerPC architecture may provide concessions to ensure that 
processors that implement the feature do so consistently. 


Imprecise exception. A type of synchronous exception that is allowed not to 
adhere to the precise exception model (see Precise exception). The 
PowerPC architecture allows only floating-point exceptions to be 
handled imprecisely. 


Inexact. Loss of accuracy in an arithmetic operation when the rounded result 
differs from the infinitely precise value with unbounded range. 


In-order. An aspect of an operation that adheres to a sequential model. An 
operation is said to be performed in-order if, at the time that it is 
performed, it is known to be required by the sequential execution 
model. See Out-of-order. 
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Instruction latency. The total number of clock cycles necessary to execute 
an instruction and make ready the results of that instruction. 


Instruction parallelism. A feature of PowerPC processors that allows 
instructions to be processed in parallel. 


Interrupt. An asynchronous exception. On PowerPC processors, interrupts 
are a special case of exceptions. See also asynchronous exception. 


Invalid state. State of a cache entry that does not currently contain a valid 
copy of a cache block from memory. 


Key bits. A set of key bits referred to as Ks and Kp in each segment register 
and each BAT register. The key bits determine whether supervisor or 
user programs can access a page within that segment or block. 


Kill. An operation that causes a cache block to be invalidated. 


L2 cache. See Secondary cache. 


Least-significant bit (Isb). The bit of least value in an address, register, data 
element, or instruction encoding. 


Least-significant byte (LSB). The byte of least value in an address, register, 
data element, or instruction encoding. 


Little-endian. A byte-ordering method in memory where the address n of a 
word corresponds to the least-significant byte. In an addressed 
memory word, the bytes are ordered (left to right) 3, 2, 1, 0, with 3 
being the most-significant byte. See Big-endian. 


MESI (modified/exclusive/shared/invalid). Cache coherency protocol used 
to manage caches on different devices that share a memory system. 
Note that the PowerPC architecture does not specify the 
implementation of a MESI protocol to ensure cache coherency. 


Memory access ordering. The specific order in which the processor 
performs load and store memory accesses and the order in which 
those accesses complete. 


Memory-mapped accesses. Accesses whose addresses use the page or block 
address translation mechanisms provided by the MMU and that 
occur externally with the bus protocol defined for memory. 
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Memory coherency. An aspect of caching in which it is ensured that an 
accurate view of memory is provided to all devices that share system 
memory. 


Memory consistency. Refers to agreement of levels of memory with respect 
to a single processor and system memory (for example, on-chip 
cache, secondary cache, and system memory). 


Memory management unit (MMU). The functional unit that is capable of 
translating an effective (logical) address to a physical address, 
providing protection mechanisms, and defining caching methods. 


Microarchitecture. The hardware details of a microprocessor’s design. Such 
details are not defined by the PowerPC architecture. 


Mnemonic. The abbreviated name of an instruction used for coding. 


Modified state. When a cache block is in the modified state, it has been 
modified by the processor since it was copied from memory. See 
MESI. 


Munging. A modification performed on an effective address that allows it to 
appear to the processor that individual aligned scalars are stored as 
little-endian values, when in fact it is stored in big-endian order, but 
at different byte addresses within double words. Note that munging 
affects only the effective address and not the byte order. Note also 
that this term is not used by the PowerPC architecture. 


Multiprocessing. The capability of software, especially operating systems, 
to support execution on more than one processor at the same time. 


Most-significant bit (msb). The highest-order bit in an address, registers, 
data element, or instruction encoding. 


Most-significant byte (MSB). The highest-order byte in an address, 
registers, data element, or instruction encoding. 


N NaN. An abbreviation for ‘Not a Number’; a symbolic entity encoded in 
floating-point format. There are two types of NaNs—signaling NaNs 
(SNaNs) and quiet NaNs (QNaNs). 


No-op. No-operation. A single-cycle operation that does not affect registers 
or generate bus activity. 


Normalization. A process by which a floating-point value is manipulated 
such that it can be represented in the format for the appropriate 
precision (single- or double-precision). For a floating-point value to 
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be representable in the single- or double-precision format, the 
leading implied bit must be a 1. 


OEA (operating environment architecture). The level of the architecture 
that describes PowerPC memory management model, supervisor- 
level registers, synchronization requirements, and the exception 
model. It also defines the time-base feature from a supervisor-level 
perspective. Implementations that conform to the PowerPC OEA 
also conform to the PowerPC UISA and VEA. 


Optional. A feature, such as an instruction, a register, or an exception, that is 
defined by the PowerPC architecture but not required to be 
implemented. 


Out-of-order. An aspect of an operation that allows it to be performed ahead 
of one that may have preceded it in the sequential model, for 
example, speculative operations. An operation is said to be 
performed out-of-order if, at the time that it is performed, it is not 
known to be required by the sequential execution model. See 
In-order. 


Out-of-order execution. A technique that allows instructions to be issued 
and completed in an order that differs from their sequence in the 
instruction stream. 


Overflow. An error condition that occurs during arithmetic operations when 
the result cannot be stored accurately in the destination register(s). 
For example, if two 32-bit numbers are multiplied, the result may not 
be representable in 32 bits. 


Page. A region in memory. The OEA defines a page as a 4-Kbyte area of 
memory, aligned on a 4-Kbyte boundary. 


Page access history bits. The changed and referenced bits in the PTE keep 
track of the access history within the page. The referenced bit is set 
by the MMU whenever the page is accessed for a read or write 
operation. The changed bit is set when the page is stored into. See 
Changed bit and Referenced bit. 


Page fault. A page fault is a condition that occurs when the processor 
attempts to access a memory location that does not reside within a 
page not currently resident in physical memory. On PowerPC 
processors, a page fault exception condition occurs when a 
matching, valid page table entry (PTE[V] = 1) cannot be located. 
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Page table. A table in memory is comprised of page table entries, or PTEs. 
It is further organized into eight PTEs per PTEG (page table entry 
group). The number of PTEGs in the page table depends on the size 
of the page table (as specified in the SDR1 register). 


Page table entry (PTE). Data structures containing information used to 
translate effective address to physical address on a 4-Kbyte page 
basis. A PTE consists of 8 bytes of information in a 32-bit processor 
and 16 bytes of information in a 64-bit processor. 


Physical memory. The actual memory that can be accessed through the 
system’s memory bus. 


Pipelining. A technique that breaks operations, such as instruction 
processing or bus transactions, into smaller distinct stages or tenures 
(respectively) so that a subsequent operation can begin before the 
previous one has completed. 


Precise exceptions. A category of exception for which the pipeline can be 
stopped so instructions that preceded the faulting instruction can 
complete, and subsequent instructions can be flushed and 
redispatched after exception handling has completed. See Imprecise 
exceptions. 


Primary opcode. The most-significant 6 bits (bits 0-5) of the instruction 
encoding that identifies the type of instruction. See Secondary 
opcode. 


Protection boundary. A boundary between protection domains. 


Protection domain. A protection domain is a segment, a virtual page, a BAT 
area, or a range of unmapped effective addresses. It is defined only 
when the appropriate relocate bit in the MSR (IR or DR) is 1. 


Quad word. A group of 16 contiguous locations starting at an address 
divisible by 16. 


Quiet NaN. A type of NaN that can propagate through most arithmetic 
operations without signaling exceptions. A quiet NaN is used to 
represent the results of certain invalid operations, such as invalid 
arithmetic operations on infinities or on NaNs, when invalid. See 
Signaling NaN. 
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rA. The rA instruction field is used to specify a GPR to be used as a source 
or destination. 


rB. The rB instruction field is used to specify a GPR to be used as a source. 


rD. The rD instruction field is used to specify a GPR to be used as a 
destination. 


rS. The rS instruction field is used to specify a GPR to be used as a source. 


Real address mode. An MMU mode when no address translation is 
performed and the effective address specified is the same as the 
physical address. The processor’s MMU is operating in real address 
mode if its ability to perform address translation has been disabled 
through the MSR registers IR and/or DR bits. 


Record bit. Bit 31 (or the Rc bit) in the instruction encoding. When it is set, 
updates the condition register (CR) to reflect the result of the 
operation. 


Referenced bit. One of two page history bits found in each page table entry 
(PTE). The processor sets the referenced bit whenever the page is 
accessed for a read or write. See also Page access history bits. 


Register indirect addressing. A form of addressing that specifies one GPR 
that contains the address for the load or store. 


Register indirect with immediate index addressing. A form of addressing 
that specifies an immediate value to be added to the contents of a 
specified GPR to form the target address for the load or store. 


Register indirect with index addressing. A form of addressing that specifies 
that the contents of two GPRs be added together to yield the target 
address for the load or store. 


Reservation. The processor establishes a reservation on a cache block of 
memory space when it executes an Iwarx instruction to read a 
memory semaphore into a GPR. 


Reserved field. In a register, a reserved field is one that is not assigned a 
function. A reserved field may be a single bit. The handling of 
reserved bits is implementation-dependent. Software is permitted to 
write any value to such a bit. A subsequent reading of the bit returns 
0 if the value last written to the bit was 0 and returns an undefined 
value (0 or 1) otherwise. 
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RISC (reduced instruction set computing). An architecture characterized 
by fixed-length instructions with nonoverlapping functionality and 
by a separate set of load and store instructions that perform memory 
accesses. 


Scalability. The capability of an architecture to generate implementations 
specific for a wide range of purposes, and in_ particular 
implementations of significantly greater performance and/or 
functionality than at present, while maintaining compatibility with 
current implementations. 


Secondary cache. A cache memory that is typically larger and has a longer 
access time than the primary cache. A secondary cache may be 
shared by multiple devices. Also referred to as L2, or level-2, cache. 


Segment. A 256-Mbyte area of virtual memory that is the most basic memory 
space defined by the PowerPC architecture. Each segment is 
configured through a unique segment descriptor. 


Segment descriptors. Information used to generate the interim virtual 
address. The segment descriptors reside in 16 on-chip segment 
registers for 32-bit implementations. For 64-bit implementations, the 
segment descriptors reside as segment table entries in a hashed 
segment table in memory. 


Set (v). To write a nonzero value to a bit or bit field; the opposite of clear. The 
term ‘set’ may also be used to generally describe the updating of a 
bit or bit field. 


Set (n). A subdivision of a cache. Cacheable data can be stored in a given 
location in any one of the sets, typically corresponding to its lower- 
order address bits. Because several memory locations can map to the 
same location, cached data is typically placed in the set whose cache 
block corresponding to that address was used least recently. See Set- 
associative. 


Set-associative. Aspect of cache organization in which the cache space is 
divided into sections, called sets. The cache controller associates a 
particular main memory address with the contents of a particular set, 
or region, within the cache. 


Signaling NaN. A type of NaN that generates an invalid operation program 
exception when it is specified as arithmetic operands. See Quiet 
NaN. 
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Significand. The component of a binary floating-point number that consists 
of an explicit or implicit leading bit to the left of its implied binary 
point and a fraction field to the right. 


Simplified mnemonics. Assembler mnemonics that represent a more 
complex form of a common operation. 


Static branch prediction. Mechanism by which software (for example, 
compilers) can give a hint to the machine hardware about the 
direction a branch is likely to take. 


Sticky bit. A bit that when set must be cleared explicitly. 


Strong ordering. A memory access model that requires exclusive access to 
an address before making an update, to prevent another device from 
using stale data. 


Superscalar machine. A machine that can issue multiple instructions 
concurrently from a conventional linear instruction stream. 


Supervisor mode. The privileged operation state of a processor. In 
supervisor mode, software, typically the operating system, can 
access all control registers and can access the supervisor memory 
space, among other privileged operations. 


Synchronization. A process to ensure that operations occur strictly in order. 
See Context synchronization and Execution synchronization. 


Synchronous exception. An exception that is generated by the execution of 
a particular instruction or instruction sequence. There are two types 
of synchronous exceptions, precise and imprecise. 


System memory. The physical memory available to a processor. 
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T TLB (translation lookaside buffer) A cache that holds recently-used page 
table entries. 


Throughput. The measure of the number of instructions that are processed 
per clock cycle. 


Tiny. A floating-point value that is too small to be represented for a particular 
precision format, including denormalized numbers; they do not 
include +0. 


U UISA (user instruction set architecture). The level of the architecture to 
which user-level software should conform. The UISA defines the 
base user-level instruction set, user-level registers, data types, 
floating-point memory conventions and exception model as seen by 
user programs, and the memory and programming models. 


Underflow. An error condition that occurs during arithmetic operations when 
the result cannot be represented accurately in the destination register. 
For example, underflow can happen if two floating-point fractions 
are multiplied and the result requires a smaller exponent and/or 
mantissa than the single-precision format can provide. In other 
words, the result is too small to be represented accurately. 


Unified cache. Combined data and instruction cache. 


User mode. The unprivileged operating state of a processor used typically by 
application software. In user mode, software can only access certain 
control registers and can access only user memory space. No 
privileged operations can be performed. Also referred to as problem 
state. 


V VEA (virtual environment architecture). The level of the architecture that 
describes the memory model for an environment in which multiple 
devices can access memory, defines aspects of the cache model, 
defines cache control instructions, and defines the time-base facility 
from a user-level perspective. Implementations that conform to the 
PowerPC VEA also adhere to the UISA, but may not necessarily 
adhere to the OEA. 


Virtual address. An intermediate address used in the translation of an 
effective address to a physical address. 
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Virtual memory. The address space created using the memory management 
facilities of the processor. Program access to virtual memory is 
possible only when it coincides with physical memory. 


Weak ordering. A memory access model that allows bus operations to be 
reordered dynamically, which improves overall performance and in 
particular reduces the effect of memory latency on instruction 
throughput. 


Word. A 32-bit data element. 


Write-back. A cache memory update policy in which processor write cycles 
are directly written only to the cache. External memory is updated 
only indirectly, for example, when a modified cache block is cast out 
to make room for newer data. 


Write-through. A cache memory update policy in which all processor write 
cycles are written to both the cache and memory. 


PowerPC Microprocessor Family: The Programming Environments (32-Bit) 


A 


Accesses 
access order, 5-2 
atomic accesses (guaranteed), 5-4 
atomic accesses (not guaranteed), 5-4 
misaligned accesses, 3-1 
Acronyms and abbreviated terms, list, xxxiii 
add, 4-11, 8-10 
addc, 4-12, 8-11 
adde, 4-12, 8-12 
addi, 4-11, 8-13, F-23 
addic, 4-11, 8-14 
addic., 4-11, 8-15 
addis, 4-11, 8-16, F-23 
addme, 4-12, 8-17 
Address calculation 
branch instructions, 4-41 
load and store instructions, 4-29 
Address mapping examples, PTEG, 7-58 
Address translation, see Memory management unit 
Addressing conventions 
alignment, 3-1 
byte ordering, 3-2, 3-6 
1/O data transfer, 3-11 
instruction memory addressing, 3-10 
mapping examples, 3-3 
memory operands, 3-2 
Addressing modes 
branch conditional to absolute, 4-44 
branch conditional to count register, 4-46, B-4 
branch conditional to link register, 4-45 
branch conditional to relative, 4-42 
branch relative, 4-42 
branch to absolute, 4-43 
register indirect 
integer, 4-30 
with immediate index, floating-point, 4-37 
with immediate index, integer, 4-29 
with index, floating-point, 4-38 
with index, integer, 4-30 
addze, 4-13, 8-18 
Aligned data transfer, 1-10, 3-1 
Aligned scalars, LE mode, 3-6 
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Alignment 
AL bit in MSR, POWER, B-2 
alignment exception 
description, 6-27 
integer alignment exception, 6-30 
interpreting the DSISR settings, 6-31 
LE mode alignment exception, 6-30 
MMU-related exception, 7-16 
overview, 6-4 
partially executed instructions, 6-11 
register settings, 6-28 
alignment for load/store multiple, B-5 
tules, 3-1, 3-6 
and, 4-16, 8-19 
andc, 4-17, 8-20 
andi., 4-16, 8-21 
andis., 4-16, 8-22 
Architecture, xxv 
Arithmetic instructions 
floating-point, 4-21, A-17 
integer, 4-2, 4-11, A-14 
Asynchronous exceptions 
causes, 6-3 
classifications, 6-3 
decrementer exception, 6-5, 6-9, 6-35 
external interrupt, 6-4, 6-9, 6-27 
machine check exception, 6-4, 6-8, 6-22 
system reset, 6-4, 6-8, 6-21 
types, 6-8 
Atomic memory references 
atomicity, 5-4 
Idarx/stdcex., 4-53, 5-4, E-1 
lwarx/stwex., 4-53, 5-4, E-1 
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b, 4-49, 8-23 
BAT registers, see Block address translation 
be, 4-49, 8-24 
bectr, 4-50, 8-26 
belr, 4-50, 8-28 
Biased exponent format, 3-17 
Big-endian mode 
blocks, 7-3 
byte ordering, 1-9, 3-2 
concept, 3-2 
mapping, 3-4 
memory operand placement, 3-13 
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Block address translation 
BAT array 
access protection summary, 7-29 
address recognition, 7-22 
BAT register implementation, 7-24 
fully-associative BAT arrays, 7-20 
organization, 7-20 
BAT registers 
access translation, 2-29 
BAT area lengths 
general information, 2-24 
implementation of BAT array, 7-24 
WIMG bits, 2-25, 5-13, 7-26 
block address translation flow, 7-11, 7-32 
block memory protection, 7-27—7-30, 7-42 
block size options, 7-26 
definition, 2-24, 7-7 
selection of block address translation, 7-7, 7-22 
summary, 7-32 
BO operand encodings, 2-13, 4-47, B-3 
Boundedly undefined, definition, 4-4 
Branch instructions 
address calculation, 4-41 
BO operand encodings, 2-13, 4-47 
branch conditional 
absolute addressing mode, 4-44 
CTR addressing mode, 4-46, B-4 
LR addressing mode, 4-45 
relative addressing mode, 4-42 
branch instructions, 4-49, A-22, F-6 
branch, relative addressing mode, 4-42 
condition register logical, 4-50, A-23, F-18 
conditional branch control, 4-47 
description, 4-49, A-22 
simplified mnemonics, F-6 
system linkage, 4-52, 4-63, A-23 
trap, 4-51, A-23 
branch instructions 
BO operand encodings, B-3 
Byte ordering 
aligned scalars, LE mode, 3-6 
big-endian mode, default, 3-2, 3-2, 3-6 
concept, 3-2 
default, 1-9, 4-7 
LE and ILE bits in MSR, 1-10, 3-6 
least-significant bit (Isb), 3-26 
least-significant byte (LSB), 3-2 
little-endian mode 
description, 3-3 
instruction addressing, 3-10 
misaligned scalars, LE mode, 3-9 
most-significant byte (MSB), 3-2 
nonscalars, 3-10 
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Cache 
atomic access, 5-4 
block, definition, 5-1 
cache coherency maintenance, 5-1 
cache model, 5-1, 5-5 
clearing a cache block, 5-9 
Harvard cache model, 5-5 
synchronization, 5-3 
unified cache, 5-5 
Cache block, definition, 5-1 
Cache coherency 
copy-back operation, 5-14 
memory/cache access modes, 5-6 
WIMG bits, 5-12, 7-65 
write-back mode, 5-14 
Cache implementation, 1-13 
Cache management instructions 
debf, 4-61, 5-10, 8-45 
debi, 4-66, 5-19, 8-47 
dcbst, 4-60, 5-9, 8-48 
debt, 4-59, 5-8, 8-49 
debtst, 4-59, 5-8, 8-50 
debz, 4-59, 4-60, 5-9, 8-51 
eieio, 4-58, 5-2, 8-61 
icbi, 4-61, 5-11, 8-98 
isync, 4-58, 5-11, 8-99 
list of instructions, 4-59, 4-66, A-24 
Cache model, Harvard, 5-5 
Caching-inhibited attribute (I) 
caching-inhibited/-allowed operation, 5-6, 5-14 
Changed (C) bit maintenance 
page history information, 7-11 
recording, 7-11, 7-38, 7-40, 7-40 
updates, 7-64 
Changes in this revision, summary, 1-7, 1-15 
Classes of instructions, 4-3, 4-3 
Classifications, exception, 6-3 
cmp, 4-15, 8-30 
cmpi, 4-15, 8-31 
cmpl, 4-15, 8-32 
cmpli, 4-15, 8-33 
entlzw, 4-17, 8-34 
Coherence block, definition, 5-1 
Compare and swap primitive, E-4 
Compare instructions 
floating-point, 4-25, A-18 
integer, 4-15, A-14 
simplified mnemonics, F-3 
Computation modes 
effective address, 4-3 
PowerPC architecture, 1-4, 4-3 
Conditional branch control, 4-47 
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Context synchronization 
data access, 2-37 
description, 6-6 
exception, 2-36 
instruction access, 2-38 
requirements, 2-36 
return from exception handler, 6-19 
Context-altering instruction, definition, 2-36 
Context-synchronizing instructions, 2-36, 4-8 
Conventions 
instruction set 
classes of instructions, 4-3 
computation modes, 4-3 
memory addressing, 4-7 
sequential execution model, 4-3 
operand conventions 
architecture levels represented, 3-1 
biased exponent values, 3-19 
significand value, 3-17 
tiny, definition, 3-18 
underflow/overflow, 3-16 
terminology, xxxv 
CR (condition register) 
bit fields, 2-5 
CR bit and identification symbols, F-1 
CR logical instructions, 4-50, A-23 
CR settings, 4-26, B-2 
CRO/CR1 field definitions, 2-6—2-6 
CRn field, compare instructions, 2-7 
move to/from CR instructions, 4-52 
simplified mnemonics, F-18 
CR logical instructions, 4-50, A-23, F-18 
crand, 4-50, 8-35 
crandc, 4-51, 8-36 
creqv, 4-51, 8-37 
cmand, 4-50, 8-38 
crnor, 4-51, 8-39 
cror, 4-50, 8-40 
crorc, 4-51, 8-41 
crxor, 4-50, 8-42 
CTR (count register) 
BO operand encodings, 2-13 
branch conditional to count register, 4-46, B-4 
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DABR (data address breakpoint register), 2-34, 6-24 
DAR (data address register) 
alignment exception register settings, 6-29 
description, 2-29 
DSI exception register settings, 6-25 
Data cache 
clearing bytes, B-7 
instructions, 5-8 
Data cache block allocate instruction, 8-43 
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Data handling and precision, 3-24 
Data organization, memory, 3-1 
Data transfer 
aligned data transfer, 1-10, 3-1 
I/O data transfer addressing, LE mode, 3-11 
Data types 
aligned scalars, 3-6 
misaligned scalars, 3-9 
nonscalars, 3-10 
dcba, 8-43 
debf, 4-61, 5-10, 8-45 
debi, 4-66, 5-19, 8-47 
debst, 4-60, 5-9, 8-48 
debt, 4-59, 5-8, 8-49 
debtst, 4-59, 5-8, 8-50 
dcebz, 4-59, 4-60, 5-9, 8-51, B-7 
DEC (decrementer register) 
decrementer operation, 2-33 
POWER and PowerPC, B-9 
writing and reading the DEC, 2-34 
Decrementer exception, 6-5, 6-9, 6-35 
Defined instruction class, 4-4 
Denormalization, definition, 3-23 
Denormalized numbers, 3-20 
Direct-store segment 
description, 7-68 
direct-store address translation 
definition, 7-7 
selection, 7-9, 7-13, 7-34, 7-68 
direct-store facility, 7-7 
1/O interface considerations, 5-19 
instructions not supported, 7-69 
integer alignment exception, 6-30 
key bit description, 7-10 
key/PP combinations, conditions, 7-44 
no-op instructions, 7-70 
protection, 7-10 
segment accesses, 7-69 
translation summary flow, 7-70 
divw, 4-14, 8-53 
divwu, 4-14, 8-55 
DSI exception 
description, 6-4 
partially executed instructions, 6-11, 6-23 
DSISR register 
settings for alignment exception, 6-29 
settings for DSI exception, 6-25 
settings for misaligned instruction, 6-31 
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EAR (external access register) 
bit format, 2-36 
eciwx, 4-62, 8-57 
ecowx, 4-62, 8-59 
Effective address calculation 
address translation, 2-29, 7-1 
branches, 4-7, 4-41 
EA modifications, 3-7 
loads and stores, 4-7, 4-29, 4-37 
eieio, 4-58, 5-2, 8-61 
eqv, 4-17, 8-63 
Exceptions 
alignment exception, 6-4, 6-27 
asynchronous exceptions, 6-3, 6-8 
classes of exceptions, 6-3, 6-12 
conditions for key/PP combinations, 7-44 
context synchronizing exception, 2-36 
decrementer exception, 6-5, 6-9, 6-35 
DSI exception, 6-4, 6-11, 6-23 
enabling/disabling exceptions, 6-17 
exception classes, 6-3, 6-12 
exception conditions 
inexact, 3-43 
invalid operation, 3-37 
MMU exception conditions, 7-16 
overflow, 3-41 
overview, 6-4 
program exception conditions, 6-5, 6-33, 6-33 
recognizing/handling, 6-1 
underflow, 3-42 
zero divide, 3-38 
exception definitions, 6-20 
exception model, overview, 1-13 
exception priorities, 6-12 
exception processing 
description, 6-14 
stages, 6-2 
steps, 6-18 
exceptions, effects on FPSCR, B-6 
external interrupt, 6-4, 6-9, 6-27 
FP assist exception, 6-5, 6-39 
FP exceptions, B-8 
FP program exceptions, 3-28, 6-5, 6-33, 6-33 
FP unavailable exception, 6-5, 6-34 
FPECR register, 2-20 
TEEE FP enabled program __ exception 
condition, 6-5, 6-33 
illegal instruction program exception 
condition, 6-5, 6-33 
imprecise exceptions, 6-9 
instruction causing conditions, 4-9 
integer alignment exception, 6-30 
ISI exception, 6-4, 6-26 


LE mode alignment exception, 6-30 
machine check exception, 6-4, 6-8, 6-22 
MMU-related exceptions, 7-15 
overview, 1-13 
precise exceptions, 6-6 
privileged instruction type program exception 
condition, 6-5, 6-33 
program exception 
conditions, 6-5, 6-33, 6-33 
register settings 
FPSCR, 3-28 
MSR, 6-20 
SRRO/SRR1, 6-14 
reset exception, 6-4, 6-8, 6-21, 6-21 
return from exception handler, 6-19 
summary, 4-9, 6-4 
synchronous/precise exceptions, 6-3, 6-7 
system call exception, 6-5, 6-36 
terminology, 6-2 
trace exception, 6-5, 6-37 
translation exception conditions, 7-15 
trap program exception condition, 6-5, 6-34 
vector offset table, 6-4 
Exclusive OR (XOR), 3-6 
Execution model 
floating-point, 3-15 
IEEE operations, D-1 
in-order execution, 5-16 
multiply-add instructions, D-4 
out-of-order execution, 5-16 
sequential execution, 4-3 
Execution synchronization, 4-9, 6-7 
Extended mnemonics, see Simplified mnemonics 
Extended/primary opcodes, 4-4 
External control instructions, 4-62, 8-57—8-59, A-25 
External interrupt, 6-4, 6-9, 6-27 
extsb, 4-17, 8-64 
extsh, 4-17, 8-65 


F 

fabs, 4-28, 8-66 
fadd, 4-21, 8-67 
fadds, 4-21, 8-68 
fempo, 4-26, 8-69 
fempu, 4-26, 8-70 
fetiw, 4-25, 8-71 
fctiwz, 4-25, 8-72 
fdiv, 4-22, 8-73 
fdivs, 4-22, 8-74 
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Floating-point model 
biased exponent format, 3-17 
binary FP numbers, 3-19 
data handling, 3-24 
denormailized numbers, 3-20 
execution model 
floating-point, 3-15 
IEEE operations, D-1 
multiply-add instructions, D-4 
FEO/FE1 bits, 2-22 
FP arithmetic instructions, 4-21, A-17 
FP assist exceptions, 6-5 
FP compare instructions, 4-25, A-18 
FP data formats, 3-16 
FP execution model, 3-15 
FP load instructions, 4-38, A-21, D-15 
FP move instructions, 4-28, A-22 
FP multiply-add instructions, 4-23, A-17 
FP program exceptions 
description, 3-28, 6-33 
exception conditions, 6-5 
FEO/FE1 bits, 6-10 
POWER/PowerPC, MSR bit 20, B-8 
FP rounding/conversion instructions, 4-25, A-18 
FP store instructions, 4-40, A-22, B-7, D-16 
FP unavailable exception, 6-5, 6-34 
FPRO-FPR31, 2-4 
FPSCR instructions, 4-26, A-18 
IEEE floating-point fields, 3-17 
TEEE-754 compatibility, 1-10, 3-17 
infinities, 3-21 
models for FP instructions, D-6 
NaNs, 3-21 
normalization/denormalization, 3-23 
normalized numbers, 3-19 
precision handling, 3-24 
program exceptions, 3-28 
recognized FP numbers, 3-18 
rounding, 3-25 
sign of result, 3-22 
single-precision representation in FPR, 3-25 
value representation, FP model, 3-18 
zero values, 3-20 
Flow control instructions 
branch instruction address calculation, 4-41 
condition register logical, 4-50 
system linkage, 4-52, 4-63 
trap, 4-51 
fmadd, 4-23, 8-75 
fmadds, 4-24, 8-76, 8-76 
fmr, 4-28, 8-77 
fmsub, 4-24, 8-78 
fmsubs, 4-24, 8-79 
fmul, 4-22, 8-80 
fmuls, 4-22, 8-81, 8-81 
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fnabs, 4-28, 8-82 
fneg, 4-28, 8-83 
fnmadd, 4-24, 8-84 
fnmadds, 4-24, 8-85, 8-85 
fnmsub, 4-24, 8-86 
fnmsubs, 4-24, 8-87, 8-87 
FP assist exception, 6-39 
FP exceptions, 6-34, 6-39 
FPCC (floating-point condition code), 4-25 
FPECR (floating-point exception cause register), 2-32 
FPRO-FPR31 (floating-point registers), 2-4 
FPSCR (floating-point status and control register) 
bit settings, 2-8, 3-29 
FP result flags in FPSCR, 3-31 
FPCC, 4-25 
FPSCR instructions, 4-26, A-18 
FR and FI bits, effects of exceptions, B-6 
move from FPSCR, B-7 
RN field, 3-26 
fres, 4-22, 8-88 
frsp, 3-24, 4-25, 8-90 
frsqrte, 4-23, 8-91 
fsel, 4-23, 8-93, D-5 
fsqrt, 4-22, 8-94 
fsqrts, 4-22, 8-95 
fsub, 4-21, 8-96 
fsubs, 4-21, 8-97 


G 


GPRO-GPR31 (general purpose registers), 2-3 
Graphics instructions 

fres, 4-22, 8-88 

frsqrte, 4-23, 8-91 

fsel, 4-23, 8-93 

stfiwx, 4-41, 8-185 
Guarded attribute (G) 

G-bit operation, 5-7, 5-16 

guarded memory, 5-17 

out-of-order execution, 5-16 


H 


Harvard cache model, 5-5 
Hashed page tables, 7-48 
Hashing functions 
page table 
primary PTEG, 7-52, 7-59 
secondary PTEG, 7-52, 7-60 


I/O data transfer addressing, LE mode, 3-11 
I/O interface considerations 
direct-store operations, 5-19 
memory-mapped I/O interface operations, 5-19 
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icbi, 4-61, 5-11, 8-98 
IEEE 64-bit execution model, D-1 
IEEE FP enabled program exception 
condition, 6-5, 6-33 
Illegal instruction class, 4-6 
Illegal instruction program exception 
condition, 6-5, 6-33 
Imprecise exceptions, 6-9 
Inexact exception condition, 3-43 
In-order execution, 5-16 
Instruction addressing 
LE mode examples, 3-11 
Instruction cache instructions, 5-10 
Instruction restart, 3-14 
Instruction set conventions 
classes of instructions, 4-3 
computation modes, 4-3 
memory addressing, 4-7 
sequential execution model, 4-3 
Instructions 
64-bit bridge instructions 
optional instructions, 4-5 
boundedly undefined, definition, 4-4 
branch instructions 
branch address calculation, 4-41 
branch conditional 
absolute addressing mode, 4-44 
CTR addressing mode, 4-46 
LR addressing mode, 4-45 
relative addressing mode, 4-42 
branch instructions, 4-49, A-22, F-6 
condition register logical, 4-50 
conditional branch control, 4-47 
description, 4-49, A-22 
effective address calculation, 4-41 
system linkage, 4-52, 4-63 
trap, 4-51 
cache management instructions 
debf, 4-61, 5-10, 8-45 
debi, 4-66, 5-19, 8-47 
dcbst, 4-60, 5-9, 8-48 
debt, 4-59, 5-8, 8-49 
debtst, 4-59, 5-8, 8-50 
dcebz, 4-59, 4-60, 5-9, 8-51 
eieio, 4-58, 5-2, 8-61 
icbi, 4-61, 5-11, 8-98 
isync, 4-58, 5-11, 8-99 
list of instructions, 4-59, 4-66, A-24 
classes of instructions, 4-3 
condition register logical, 4-50, A-23 
conditional branch control, 4-47 
context-altering instructions, 2-36 
context-synchronizing instructions, 2-36, 4-8 
defined instruction class, 4-4 


execution synchronization, 3-35 
external control instructions, 4-5, 4-62, A-25 
floating-point 
arithmetic, 4-21, 8-73, A-17 
compare, 4-25, 8-69, A-18, F-3 
computational instructions, 3-15 
FP conversions, D-5 
FP load instructions, 4-38, A-21, D-15 
FP move instructions, 4-28, A-22 
FP store instructions, A-22, B-7, D-16 
FPSCR instructions, 4-26, A-18 
models for FP instructions, D-6 
multiply-add, 4-23, A-17, D-4 
noncomputational instructions, 3-15 
rounding/conversion, 4-25, ??-8-72, A-18 
flow control instructions 
branch address calculation, 4-41 
CR logical, 4-50 
system linkage, 4-52, 4-63 
trap, 4-51 
graphics instructions 
fres, 4-22, 8-88 
frsqrte, 4-23, 8-91 
fsel, 4-23, 8-93 
stfiwx, 4-41, 8-185 
illegal instruction class, 4-6 
instruction fetching 
branch/flow control instructions, 4-41 
direct-store segment, 7-15 
exception processing steps, 6-18 
exception synchronization steps, 6-6 
instruction cache instructions, 5-10 
integer store instructions, 4-33 
multiprocessor systems, 5-11 
precise exceptions, 6-6 
uniprocessor systems, 5-10 
instruction field conventions, xxxvi 
instructions not supported, direct-store, 7-69 
integer 
arithmetic, 4-2, 4-10, A-14 
compare, 4-15, A-14, F-3 
load, 4-31, A-19, A-19 
load/store multiple, 4-35, A-20, B-5 
load/store string, 4-36, A-20, B-5 
load/store with byte reverse, 4-34, A-20 
logical, 4-2, 4-16, A-15 
rotate/shift, 4-18—4-19, A-16—A-16, F-4 
store, 4-33, A-20 
invalid instruction forms, 4-5 
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load and store 
address generation, floating-point, 4-37 
address generation, integer, 4-29 
byte reverse instructions, 4-34, A-20 
floating-point load, 4-38, A-21 
floating-point move, 4-28, A-22 
floating-point store, 4-40, B-7 
integer load, 4-31, A-19, A-19 
integer store, 4-33, A-20 
memory synchronization, 4-53, 4-55, 4-57, A-21 
multiple instructions, 4-35, A-20, B-5 
string instructions, 4-36, A-20, B-5 
lookaside buffer management 
instructions, 4-65, 4-67, A-25 
memory control instructions, 4-58, 4-65 
memory synchronization instructions 
eieio, 4-58, 5-2, 8-61 
isync, 4-58, 5-11, 8-99 
list of instructions, 4-55, 4-57, A-21 
lwarx, 4-55, 8-126 
stwex., 4-55, 8-200 
sync, 4-55, 5-3, 8-211, B-5 
new instructions 
mtmsrd, 7-65 
no-op, 4-4, F-23 
optional instructions, 4-5 
partially executed instructions, 6-11 
POWER instructions 
deleted in PowerPC, B-9 
supported in PowerPC, B-11 
PowerPC instructions, list, A-1, A-8, A-14 
preferred instruction forms, 4-4 
processor control 
instructions, 4-52, 4-56, 4-64, A-24 
reserved bits, POWER and PowerPC, B-2 
reserved instructions, 4-6 
segment register manipulation 
instructions, 4-66, A-25 
SLB management instructions, 4-67 
supervisor-level cache management 
instructions, 4-65 
supervisor-level instructions, 4-9 
system linkage instructions, 4-52, 4-63, A-23 
TLB management instructions, 4-67, A-25 
trap instructions, 4-51, A-23 
Integer alignment exception, 6-30 
Integer arithmetic instructions, 4-2, 4-10, A-14 
Integer compare instructions, 4-15, A-14, F-3 
Integer load instructions, 4-31, A-19, A-19 
Integer logical instructions, 4-2, 4-16, A-15 
Integer rotate and shift instructions, F-4 
Integer rotate/shift 
instructions, 4-18-4-19, A-16-A-16, F-4 


Index 


Integer store instructions 
description, 4-33 
instruction fetching, 4-33 
list, A-20 
Interrupts, see Exceptions 
Invalid instruction forms, 4-5 
Invalid operation exception condition, 3-37 
ISI exception, 6-4, 6-26 
isync, 4-58, 5-11, 8-99 


K 
Key (Ks, Kp) protection bits, 7-42 


L 

Ibz, 4-32, 8-100 

Ibzu, 4-32, 8-101 

Ibzux, 4-32, 8-102 

Ibzx, 4-32, 8-103 

Idarx/stdcx. 
general information, 5-4, E-1 

Ifd, 4-39, 8-104 

Ifdu, 4-39, 8-105 

Ifdux, 4-39, 8-106 

Ifdx, 4-39, 8-107 

Ifs, 4-39, 8-108 

Ifsu, 4-39, 8-109 

lfsux, 4-39, 8-110 

Ifsx, 4-39, 8-111 

lha, 4-32, 8-112 

lhau, 4-32, 8-113 

Thaux, 4-32, 8-114 

lhax, 4-32, 8-115 

lhbrx, 4-35, 8-116 

lhz, 4-32, 8-117 

lhzu, 4-32, 8-118 

lhzux, 4-32, 8-119 

lhzx, 4-32, 8-120 

Little-endian mode 
alignment exception, 6-30 
byte ordering, 3-3, 3-6 
description, 3-3 
I/O data transfer addressing, 3-11 
instruction addressing, 3-10 
LE and ILE bits, 3-6 
mapping, 3-5 
misaligned scalars, 3-9 
munged structure S, 3-7—3-8 

LK bit, inappropriate use, B-3 

Imw, 4-36, 8-121, B-5 
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Load/store 
address generation, floating-point, 4-38 
address generation, integer, 4-29 
byte reverse instructions, 4-34, A-20 
floating-point load instructions, 4-38, A-21 
floating-point move instructions, 4-28, A-22 
floating-point store instructions, 4-40, A-22, B-7 
integer load instructions, 4-31, A-19, A-19 
integer store instructions, 4-33, A-20 
load/store multiple instructions, 4-35, A-20, B-5 
memory synchronization instructions, 4-53, A-21 
string instructions, 4-36, A-20, B-5 
Logical addresses 
translation into physical addresses, 7-1 
Logical instructions, integer, 4-2, 4-16, A-15 
Lookaside buffer management 
instructions, 4-65, 4-67, A-25 
Iswi, 4-36, 8-122, B-5 
Iswx, 4-36, 8-124, B-5 
lwarx, 4-53, 4-55, 8-126 
lwarx/stwex. 
general information, 5-4, E-1 
list insertion, E-6 
lwarx, 4-55, 8-126 
semaphores, 4-53 
stwex., 4-55, 8-200 
synchronization primitive examples, E-2 
lwbrx, 4-35, 8-127 
lwz, 4-32, 8-128 
lwzu, 4-33, 8-129 
lwzux, 4-33, 8-130 
lwzx, 4-32, 8-131 


Machine check exception 
causing conditions, 6-4, 6-8, 6-22 
non-recoverable, causes, 6-22 
register settings, 6-23 

merf, 4-51, 8-132 

merfs, 4-27, 8-133 

merxr, 4-52, 8-134 

Memory access 
ordering, 5-2 
update forms, B-4 

Memory addressing, 4-7 

Memory coherency 
coherency controls, 5-5 
coherency precautions, 5-7 
M-bit operation, 5-7, 5-7, 5-15 
memory access modes, 5-6 
sync instruction, 5-3 


Memory control instructions 


segment register manipulation, 4-66, A-25 
SLB management, 4-67 

supervisor-level cache management, 4-65 
TLB management, 4-67 

user-level cache, 4-58 


Memory management unit 


address translation flow, 7-11 

address translation mechanisms, 7-7, 7-11 
address translation types, 7-8 

block address translation, 7-7, 7-11, 7-20 
conceptual block diagram, 7-6 
direct-store address translation, 7-13, 7-68 
exceptions summary, 7-15 

hashing functions, 7-52 

instruction summary, 7-17 

memory addressing, 7-4 

memory protection, 7-9, 7-30, 7-42 
MMU exception conditions, 7-16 

MMU organization, 7-5 

MMU registers, 7-18 

MMU-related exceptions, 7-15 

overview, 1-14, 7-3 

page address translation, 7-7, 7-13, 7-46 
page history status, 7-11, 7-38, 7-40 

page table search operation, 7-48 

real addressing mode translation, 7-11, 7-19, 7-33 
register summary, 7-18 

segment model, 7-32 


Memory operands, 3-2, 4-7 
Memory segment model 


description, 7-32 
memory segment selection, 7-33 
page address translation 
overview, 7-34 
PTE definitions, 7-37 
segment descriptor definitions, 7-35 
summary, 7-46 
page history recording 
changed (C) bit, 7-40 
description, 7-38 
referenced (R) bit, 7-39 
table search operations, update history, 7-39 
page memory protection, 7-42 
recognition of addresses, 7-33 
referenced/changed bits 
changed (C) bit, 7-40 
guaranteed bit settings, model, 7-41 
recording scenarios, 7-40 
referenced (R) bit, 7-39 
synchronization of updates, 7-42 
table search operations, update history, 7-39 
updates to page tables, 7-64 
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Memory synchronization 
eieio, 4-58, 5-2, 8-61 
isync, 4-58, 5-11, 8-99 
list of instructions, 4-55, 4-57, A-21 
lwarx, 4-53, 4-55, 8-126 
stwex., 4-53, 4-55, 8-200 
sync, 4-55, 5-3, 8-211, B-5 
Memory, data organization, 3-1 
Memory/cache access modes, see WIMG bits 
mfcr, 4-52, 8-135 
mffs, 4-27, 8-136 
mfmsr, 4-64, 8-137, B-1 
mfspr, 4-53, 4-64, 8-138, B-6 
mfsr (64-bit bridge), 4-67, 8-141, B-1 
mfsrin (64-bit bridge), 4-67, 8-142 
mftb, 4-56, 8-143 
Migration to PowerPC, B-1 
Misaligned accesses and alignment, 3-1 
Mnemonics 
recommended mnemonics, F-23 
simplified mnemonics, F-1 
Move to/from CR instructions, 4-52 
MSR (machine state register) 
EE bit, 6-17 
FEO/FE1 bits, 2-22, 6-10 
FEO/FE1 bits and FP exceptions, 3-34 
LE and ILE bits, 1-10, 3-6 
RI bit, 6-19 
settings due to exception, 6-20 
mterf, 4-52, 8-145 
mtfsb0, 4-27, 8-146 
mtfsb1, 4-27, 8-147 
mtfsf, 4-27, 8-148 
mtfsfi, 4-27, 8-149 
mtmsr (64-bit bridge), 4-64, 8-150 
mtmsrd, 7-65 
mtspr, 4-53, 4-64, 8-151, B-6 
mtsr (64-bit bridge), 4-67, 8-154 
mtsrin (64-bit bridge), 4-67, 8-155 
mulhw, 4-14, 8-156 
mulhwu, 4-14, 8-157 
mulli, 4-13, 8-158 
mullw, 4-14, 8-159 
Multiple register loads, B-5 
Multiple-precision shift examples, C-1 
Multiply-add 
execution model, D-4 
instructions, floating-point, 4-23, A-17 
Multiprocessor, usage, 5-1 
Munging 
description, 3-6 
LE mapping, 3-7-3-8 
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nand, 4-17, 8-160 

NaNs (Not a Numbers), 3-21 
neg, 4-13, 8-161 

No-execute protection, 7-9, 7-12 
Nonscalars, 3-10 

No-op, 4-4, F-23 

nor, 4-17, 8-162 

Normalization, definition, 3-23 
Normalized numbers, 3-19 


O 


OEA (operating environment architecture) 
cache model and memory coherency, 5-1 
definition, xxvi, 1-5 
general changes to the architecture, 1-17, 1-17 
implementing exceptions, 6-1 
memory management specifications, 7-1 
programming model, 2-18 
register set, 2-17 

Opcodes, primary/extended, 4-4 

Operands 
BO operand encodings, 2-13, 4-47, B-3 
conventions, description, 1-9, 3-1 
memory operands, 4-7 
placement 

effect on performance, summary, 3-12 
instruction restart, 3-14 

Operating environment architecture, see OEA 

Optional instructions, 4-5, A-36 

or, 4-16, 8-163 

orc, 4-17, 8-164 

ori, 4-16, 8-165 

oris, 4-16, 8-166 

Out-of-order execution, 5-16 

Overflow exception condition, 3-41 


P 


Page address translation 
definition, 7-7 
integer alignment exception, 6-30 
overview, 7-34 
page address translation flow, 7-46 
page memory protection, 7-28, 7-42 
page size, 7-32 
page tables in memory, 7-48 
PTE definitions, 7-37 
segment descriptors, 7-33, 7-35 
selection of page address translation, 7-7, 7-13 
summary, 7-46 
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Page history status 
making R and C bit updates to page tables, 7-64 
R and C bit recording, 7-11, 7-38, 7-40 
R and C bit updates, 7-64 
Page memory protection, see Protection of memory 
areas 
Page tables 
allocation of PTEs, 7-56 
definition, 7-49 
example table structures, ??—7-58 
hashed page tables, 7-48 
hashing functions, 7-52, 7-59 
organized as PTEGs, 7-49 
page table size, 7-51 
page table structure summary, 7-56 
page table updates, 7-64 
PTEG addresses, 7-54, 7-58 
table search flow, 7-62 
table search for PTE, 7-61 
Page, definition, 5-5 
Performance 
effect of operand placement, summary, 3-12 
instruction restart, 3-14 
Physical address generation 
generation of PTEG addresses, 7-54, 7-58 
memory management unit, 7-1 
Physical memory 
physical vs. virtual memory, 5-1 
predefined locations, 7-4 
PIR (processor identification register), 2-36 
POWER architecture 
AL bit in MSR, B-2 
alignment for load/store multiple, B-5 
branch conditional to CTR, B-4 
differences in implementations, B-4 
FP exceptions, B-8 
instructions 
dclz/dcbz instructions, differences, B-7 
deleted in PowerPC, B-9 
load/store multiple, alignment, B-5 
load/store string instructions, B-5 
move from FPSCR, B-7 
move to/from SPR, B-6 
reserved bits, POWER and PowerPC, B-2 
SR instructions, differences from PowerPC, B-7 


reserved bits, POWER and PowerPC, B-2 
RTC (real-time clock), B-8 
synchronization, B-5 
timing facilities, POWER and PowerPC, B-8 
TLB entry invalidation, B-8 


PowerPC architecture 


alignment for load/store multiple, B-5 
byte ordering, 3-6 
cache model, Harvard, 5-5 
changes in this revision, summary, 1-7, 1-15 
computation modes, 1-4, 4-3 
differences in implementations, B-4 
features summary 
defined features, 1-3, 1-6 
features not defined, 1-7 
I/O data transfer addressing, 3-11 
instruction addressing, 3-10 
instruction list, A-1, A-8, A-14 
instructions 
dcbz/dclz instructions, differences, B-7 
deleted in POWER, B-9 
load/store multiple, alignment, B-5 
load/store string instructions, B-5 
move from FPSCR, B-7 
move to/from SPR, B-6 
reserved bits, POWER and PowerPC, B-2 
SR instructions, differences from POWER, B-7 
supported in POWER, B-11 
svcx/sc instructions, differences, B-4 
levels of the PowerPC architecture, 1-5—1-6 
memory access update forms, B-4 
operating environment architecture, xxvi, 1-5 
overview, 1-2 
POWER/PowerPC, incompatibilities, B-1 
registers 
CR settings, B-2 
decrementer register, B-9 
multiple register loads, B-5 
programming model, 1-8, 2-2, 2-14, 2-18 
reserved bits, POWER and PowerPC, B-2 
synchronization, B-5 
timing facilities, POWER and PowerPC, B-8 
TLB entry invalidation, B-8 
user instruction set architecture, xxv, 1-5 
virtual environment architecture, xxv, 1-5 


supported in PowerPC, B-11 PP protection bits, 7-42 

svcx/sc instructions, differences, B-4 Precise exceptions, 6-3, 6-6, 6-7 
memory access update forms, B-4 Preferred instruction forms, 4-4 
migration to PowerPC, B-1 Primary/extended opcodes, 4-4 
POWER/PowerPC incompatibilities, B-1 Priorities, exception, 6-12 
registers 


CR settings, B-2 
decrementer register, B-9 
multiple register loads, B-5 
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Privilege levels 
external control instructions, 4-62 
supervisor/user mode, 1-9 
supervisor-level cache control instruction, 4-65 
TBR encodings, 4-56 
user-level cache control instructions, 4-58 
Privileged instruction type program exception 
condition, 6-5, 6-33 
Privileged state, see Supervisor mode 
Problem state, see User mode 
Process switching, 6-19 
Processor control instructions, 4-52, 4-56, 4-64, A-24 
Program exception 
description, 3-28, 6-5, 6-33, 6-33 
five (5) program exception conditions, 6-5, 6-33 
move to/from SPR, B-6 
Programming model 
all registers (OEA), 2-18 
user-level plus time base (VEA), 2-14 
user-level registers (UISA), 2-2 
Protection of memory areas 
block access protection, 7-27, 7-28, 7-30, 7-42 
direct-store segment protection, 7-10, 7-69 
no-execute protection, 7-9, 7-12 
options available, 7-9, 7-42 
page access protection, 7-28, 7-30, 7-42 
programming protection bits, 7-42 
protection violations, 7-15, 7-30, 7-43 
PTEGs (PTE groups) 
definition, 7-49 
example primary and secondary PTEGs, 7-58 
generation of PTEG addresses, 7-54 
table search operation, 7-61 
PTEs (page table entries) 
adding a PTE, 7-65 
modifying a PTE, 7-66 
page table definition, 7-49 
page table search operation, 7-61 
page table updates, 7-64 
PTE bit definitions, 7-38 
PVR (processor version register), 2-23 


Q 


Quiet NaNs (QNaNs) 
description, 3-21 
representation, 3-22 


R 


Real address (RA), see Physical address generation 
Real addressing mode address translation (translation 
disabled) 
data/instruction accesses, 7-11, 7-19, 7-33 
definition, 7-7 


Index 


Real numbers, approximation, 3-18 
Record bit (Rc) 
description, 8-3 
inappropriate use, B-3 
Referenced (R) bit maintenance 
page history information, 7-11 
recording, 7-11, 7-38, 7-39, 7-40 
updates, 7-64 
Registers 
configuration registers 
MSR, 2-20 
PVR, 2-23 
exception handling registers 
DAR, 2-29 
DSISR, 2-30 
FPECR (optional), 2-32 
list, 2-19 
SPRGO-SPRG3, 2-30 
SRRO/SRR1, 2-31 
FPECR register (optional), 2-20 
memory management registers 
BATs, 2-24 
list, 2-19 
SDRI1, 2-27 
SRs, 2-28 
miscellaneous registers 
DABR (optional), 2-34 
DEC, 2-33 
EAR (optional), 2-35 
list, 2-20 
PIR (optional), 2-36 
TBL/TBU, 2-15 
MMU registers, 7-18 
multiple register loads, B-5 
OEA register set, 2-17 
optional registers 
DABR, 2-34 
EAR, 2-35 
FPECR, 2-32 
PIR, 2-36 
reserved bits, POWER and PowerPC, B-2 
supervisor-level 
BATs, 2-24, 7-25 
DABR, 6-24 
DABR (optional), 2-34 
DAR, 2-29 
DEC, 2-33, B-9 
DSISR, 2-30 
EAR (optional), 2-35 
FPECR (optional), 2-32 
MSR, 2-20 
PIR (optional), 2-36 
PVR, 2-23 
SDRI, 2-27 
SPRGO-SPRG3, 2-30 
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SRRO/SRR1, 2-31 
SRs, 2-28 
TBL/TBU, 2-15 
UISA register set, 2-1 
user-level 
CR, 2-5 
CTR, 2-12 
FPRO-FPR31, 2-4 
FPSCR, 2-7 
GPRO-GPR31, 2-3 
LR, 2-11 
TBL/TBU, 2-32 
XER, 2-11, B-4 
VEA register set, 2-13 
Reserved instruction class, 4-6 
Reset exception, 6-4, 6-8, 6-21 
Return from exception handler, 6-19 
rfi (64-bit bridge), 4-63, 8-167 
rlwimi, 4-19, 8-168 
rlwinm, 4-19, 8-169 
rlwnm, 4-19, 8-171 
Rotate/shift instructions, 4-18—4-19, A-16—A-16, F-4 
Rounding, floating-point operations, 3-25 
Rounding/conversion instructions, FP, 4-25 
RTC (real time clock), B-8 


Ss 


sc 
differences in implementation, POWER and 
PowerPC, B-4 
for context synchronization, 4-8 
occurrence of system call exception, 6-36 
user-level function, 4-52, 4-63, 8-172 
Scalars 
aligned, LE mode, 3-6 
big-endian, 3-2 
description, 3-2 
little-endian, 3-2 
SDR1 register 
definitions, 7-50 
format, 7-50 
generation of PTEG addresses, 7-54, 7-58 
Segment registers 
instructions 
POWER/PowerPC, differences, B-7 
segment descriptor 
definitions, 7-35 
format, 7-35 
SR manipulation instructions, 4-66, 4-66, A-25 
T = 1 format (direct-store), 7-68 
T-bit, 2-28, 7-33 
Segmented memory model, see Memory management 
unit 
Sequential execution model, 4-3 
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Shift/rotate instructions, 4-18—4-19, A-16—A-16, F-4 
Signaling NaNs (SNaNs), 3-21 
Simplified mnemonics 
branch instructions, F-6 
compare instructions, F-3 
CR logical instructions, F-18 
recommended mnemonics, 4-55, F-23 
rotate and shift, F-4 
special-purpose registers (SPRs), F-21 
subtract instructions, F-2 
trap instructions, F-19 
SLB management instructions, 4-67 
slw, 4-20, 8-173 
SNaNs (signaling NaNs), 3-21 
Special-purpose registers (SPRs), F-21 
SPRGO-SPRG3, conventional uses, 2-30 
sraw, 4-20, 8-174 
srawi, 4-20, 8-175 
SRRO/SRR1 (status save/restore registers) 
format, 2-31, 2-31 
machine check exception, register settings, 6-23 
srw, 4-20, 8-176 
stb, 4-33, 8-177 
stbu, 4-33, 8-178 
stbux, 4-34, 8-179 
stbx, 4-33, 8-180 
stdcx./Idarx 
general information, 5-4, E-1 
stfd, 4-40, 8-181 
stfdu, 4-40, 8-182 
stfdux, 4-41, 8-183 
stfdx, 4-40, 8-184 
stfiwx, 4-41, 8-185, D-16 
stfs, 4-40, 8-186 
stfsu, 4-40, 8-187 
stfsux, 4-40, 8-188 
stfsx, 4-40, 8-189 
sth, 4-34, 8-190 
sthbrx, 4-35, 8-191 
sthu, 4-34, 8-192 
sthux, 4-34, 8-193 
sthx, 4-34, 8-194 
stmw, 4-36, 8-195 
Structure mapping examples, 3-3 
stswi, 4-36, 8-196 
stswx, 4-36, 8-197 
stw, 4-34, 8-198 
stwbrx, 4-35, 8-199 
stwex., 4-53, 4-55, 8-200 
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stwex./lwarx 
general information, 5-4, E-1 
lwarx, 4-55, 8-126 
semaphores, 4-53 
stwex., 4-55, 8-200 
synchronization primitive examples, E-2 
stwu, 4-34, 8-202 
stwux, 4-34, 8-203 
stwx, 4-34, 8-204 
subf, 4-11, 8-205 
subfc, 4-12, 8-206 
subfe, 4-12, 8-207 
subfic, 4-11, 8-208 
subfme, 4-13, 8-209 
subfze, 4-13, 8-210 
Subtract instructions, F-2 
Summary of changes in this revision, 1-7, 1-15 
Supervisor mode, see Privilege levels 
sync, 4-55, 5-3, 8-211, B-5 
Synchronization 
compare and swap, E-4 
context/execution synchronization, 2-36, 4-8, 6-6 
context-altering instruction, 2-36 
context-synchronizing exception, 2-36 
context-synchronizing instruction, 2-36 
data access synchronization, 2-37 
execution of rfi, 6-19 
implementation-dependent 
requirements, 2-38, 2-39 
instruction access synchronization, 2-38 
list insertion, E-6 
lock acquisition and release, E-5 
memory synchronization instructions, 4-53, A-21 
overview, 6-6 
requirements for lookaside buffers, 2-36 
requirements for special registers, 2-36 
rfi/rfid, 2-37 
synchronization primitives, E-2 
synchronization programming examples, E-1 
synchronizing instructions, 1-11, 2-37 
Synchronous exceptions 
causes, 6-3 
classifications, 6-3 
exception conditions, 6-7 
System call exception, 6-5, 6-36 
System JEEE FP enabled program exception 
condition, 6-5, 6-33 
System linkage instructions 
list of instructions, A-23 
rfi, 8-167 
sc, 4-52, 4-63, 8-172 
System reset exception, 6-4, 6-8, 6-21 


Index 


T 


Table search operations 
hashing functions, 7-52 
page table algorithm, 7-61 
page table definition, 7-49 
SDRI register, 7-50 
table search flow (primary and secondary), 7-62 
Terminology conventions, xxxv 
Time base 
computing time of day, 2-16 
reading the time base, 2-16 
TBL/TBU, 2-15 
timer facilities, POWER and PowerPC, B-8 
writing to the time base, 2-32 
Tiny values, definition, 3-18 
TLB invalidate 
TLB entry invalidation, B-8 
TLB invalidate broadcast operations, 7-18, 7-64 
TLB management instructions, A-25 
tlbie instruction, 7-18, 7-64 
TLB management instructions, 4-67 
tlbia, 4-68, 8-212 
tlbie, 4-68, 8-213, B-8 
tlbsync, 4-68, 8-214 
tlbsync instruction emulation, 7-64 
TO operand, F-21 
Trace exception, 6-5, 6-37 
Trap instructions, 4-51, F-19 
Trap program exception condition, 6-5, 6-34 
tw, 4-51, 8-215 
twi, 4-51, 8-216 


U 


UISA (user instruction set architecture) 
definition, xxv, 1-5 
general changes to the architecture, 1-16 
programming model, 2-2 
register set, 2-1 
Underflow exception condition, 3-42 
User instruction set architecture, see UISA 
User mode, see Privilege levels 
User-level registers, list, 2-2, 2-14 


Vv 


VEA (virtual environment architecture) 
cache model and memory coherency, 5-1 
definition, xxv, 1-5 
general changes to the architecture, 1-16, 1-16 
programming model, 2-14 
register set, 2-13 
time base, 2-15 

Vector offset table, exception, 6-4 
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Virtual address 

formation, 2-29 
Virtual environment architecture, see VEA 
Virtual memory 

implementation, 7-3 

virtual vs. physical memory, 5-1 


WwW 


WIMG bits, 5-6, 7-65 
description, 5-12 
G-bit, 5-16 
in BAT register, 7-26 
in BAT registers, 5-13 
WIM combinations, 5-15 
Write-back mode, 5-14 
Write-through attribute (W) 
write-through/write-back operation, 5-6, 5-13 


X 


XER register 
bit definitions, 2-11 
difference from POWER architecture, B-4 
xor, 4-16, 8-217 
XOR (exclusive OR), 3-6 
xori, 4-16, 8-218 
xoris, 4-16, 8-219 


Z 


Zero divide exception condition, 3-38 
Zero numbers, format, 3-20 
Zero values, 3-20 
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